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Introduction 


The topic for this semester is algebraic methods in extremal combinatorics. Lectures for this class will be recorded 
in case any of us need to miss class (for COVID or other reasons), but we should still make sure to attend in person. 
If we need to miss class due to COVID isolation, we should contact the course staff or find classmates who are taking 
notes. (And if we're feeling sick or need help for any reason, we should contact S? or GradSupport, who will advocate 
on our behalf.) 

Logistically, office hours will be held in 2-171 from 1-2 on Mondays, and grading will be done using five (biweekly) 
problem sets. Homework will always appear on Canvas, and submission should be done on Gradescope. (But if this 
causes any problems for us, we should email Professor Sauermann and we can figure something out). Collaboration is 
encouraged, but we should think about the problems ourselves, and we should only write down a solution if it came 
out of a discussion that we were actively involved in (in other words, we can’t ask “what is the solution”). We should 
always make sure to indicate our collaborators at the start of a solution. 

The syllabus also includes a schedule and list of topics which is subject to change, but we can take a look on 
Canvas to see the specifics there. There are no required references, so in theory we should be fine just showing up to 


lecture and learning here! 


Fact 1 


Professor Sauermann had the students introduce ourselves by name, year, and major. 


1 January 31, 2022 


In the first three weeks of this class, we'll discuss methods in linear algebra, starting with a classic example: 


Problem 2 


Consider a town of n citizens who form (potentially intersecting) clubs, subject to the following rules: 


« Each club has an odd number of members. 


« Any two clubs have an even number of members in common. 


What is the maximum number of clubs we can form in this town (as a function of n)? 


One potential situation could be that we have n clubs, and each citizen is in a different club (this satisfies the 
conditions above). But if we try to start forming larger groups, we start running into problems — even if we try to form 
clubs of size 3 to get more than n total clubs, it’s hard to come up with constructions without having two clubs that 
intersect with Just a single person. It turns out that we can't actually improve the simple construction that gets us n 


clubs: 


Theorem 3 (Odd-town theorem) 


Under the assumptions of Problem 2, there cannot be more than n clubs. Removing the flavor text, if we have 


(distinct) subsets Cy,--- ,Cm C {1,--- , n} such that |C;j| is odd and |C;M C;j| is even for all 1 <i AJ <n, then 


m<n. 


Proof. (We have already achieved m = n by construction.) We'll first transform the setup into a linear algebraic one. 
Number the citizens of the town from 1 to n, and for each club, form a vector in {0,1}”, such that the cth entry of 
the vector is 1 if citizen c is a member of the club (this is known as the incidence vector of the club). If we have m 
clubs, then we get m vectors v1,--+ , Vm in {0,1}", and we wish to show that m < n. 

Since each club has an odd number of members, each v; has an odd number of ones, and since any two clubs have 
an even number of common members, vj - vj must be even for all /, / (because this dot product counts the number of 
indices where v; and vj are both 1). Motivated by that, notice that we can restate our first condition as vj - v; being 
odd for all /. 

Since dealing with even and odd-ness can be annoying, we can instead interpret v1,--- , Vm as vectors in the finite 
field 5, so that vj-v; is 1 if / = / and O otherwise. And thus it suffices to show that the vjs are linearly independent, 
from which the result must follow (by finite-dimensional linear algebra). Indeed, suppose A,Vvy +--+: + AmVm = O for 
some coefficients A1,--- , Am € Fo. Then taking the dot product of both sides with v;, most terms disappear because 


of the orthogonality we described above. So we find Ajvj- vi; = 0, so A; = O for all /, which is what we need to show 


linear independence, and thus we can only have dim(FS) = n total clubs. 


Extermal combinatorics is a field which studies questions of the form “given some configuration of graphs or sets 
or other combinatorial objects, under certain conditions, what is the maximum or minimum size of a configuration?” 


So the result we've just proved is a prototypical example of the kind of results we'll be seeing in this class. 


Problem 4 


We'll now ask what happens if we adjust the conditions of our problem statement a little bit, replacing “odd” in 


the odd-town theorem by “even.” We still wish to find out the maximum number of clubs that this town may have. 


The construction from above now no longer works, but we can still have a variation of this construction: if we take 
disjoint sets of size 2, we can get 4 sets (and in extermal combinatorics we often don't really care about constant 
factors like 2, since asymptotic considerations are often the best we can do). But this time we can do better — if 
we take pairs of pairs, we still satisfy the conditions in the problems, and in fact we can take any set of these pairs 
and that will still work. (In other words, imagine that we have [3 married couples, and each club either contains the 
couple or neither member.) Then all 2\"/2] of these potential clubs will be of even size and have even intersection 
(because we always just have a set of couples in each case), and we can notice that this small change to the odd-town 


problem statement has led us to a dramatically different answer! 


Theorem 5 (Even-town theorem) 


Suppose we have distinct subsets Cy,--- , Cm C {1,--- , n} such that |C;| and |CjNC;| are even for all 1 <1 Aj <n. 


Then m < 2L"/2!, 


Proof. (We have already achieved m = gln/2| by construction.) We again form the incidence vectors v1,--- , Vm € FS 
of the sets Cy,--- , Cm, in the same way as before. We may consider the span of these vectors U = span(v1,--+ , Vm); 
we can verify that we can always add new subsets in this span U and still preserve the theorem conditions, but we don't 
need that result directly. Instead, we can just observe that m < |U| = 274 (because m is the number of vectors in 
(V1,°"* Vm), while |U| is the number of vectors in its span), so it suffices to show that dimU < |n/2]. 

This time, we have v;- vj = 0 for all i,j by the theorem statement. Notice that for any u, u’ € U, we have u-u’ = 0, 


because we can write out U= Avy +--+ + AmVm and uw’ = ALYY, +--+ +A,Vm and find that (by distributivity) 


n n 
weil = 0 NAY y= DO NA; 0=0, 
ij=l ij=1 
where this calculation is being done in Fz. (This means that U is known as a totally isotropic subspace.) We can 
now consider the dual space £(IF5), which is the set of all linear forms on F5 (that is, the set of linear maps F3 — Fo). 
This vector space is also of dimension n, just like F5. Now for any vector v € F5, consider its corresponding linear 
form dy € L(FS) defined by 
oy (x) =Vv-x. 


(In other words, the linear form @¢y is the “dotting with v” map.) Sending v to dy can then be represented as a map 
® : F3 > L(F3), and in fact (we can check mechanically that) this © is itself a linear map. 

In fact, ® is an isomorphism — to show this, because we already know that F3 and L(IFS) are vector spaces of 
dimension n, we just need to show injectivity. Indeed, if ® mapped some vector v to the zero map, that would mean 
that ¢)(x) = v- x = 0 for all x € F8. But whenever v has some nonzero coordinate /, we get a contradiction by 


picking x to be the /th standard basis vector. Thus © has a trivial kernel and is injective. 


Remark 6. We may have been told in our linear algebra classes that there is no canonical isomorphism between a 
vector space and its dual (only a vector space and its double dual). While mapping v — dy may look very natural, this 


map ® is not canonical because it depends on an inner product, which depends on some choice of basis. 


We're now ready to return to the proof: let W be the subspace of £(F5) consisting of all linear forms ¢ € L(F3) 
such that @(u) = 0 for all u € U. We claim that ®(U) C W;; indeed, for any u’ € U, we wish to show that oy € W, 
and that's true because ¢,(u) = u’- u = 0 by our earlier calculation. But because ® is an isomorphism that maps U 
into W, we know that | dimU < dimW |, but furthermore W is the kernel of the linear map £(F3) — L(U) defined by 


restriction (in other words, taking any linear form @ and only looking at its action d|y on U). Furthermore, this linear 


map is surjective, because we can extend any linear form on U to one on FS (pick a basis of U and then complete that 


basis), so the rank-nullity theorem tells us that 
dimW = dim(F3) — dim £(U) = n— dim U. 
Thus plugging this back into the boxed inequality above, using the fact that dim U Is an integer, 


dimU <n—dimU => 2dimU<n => dimU <|5], 


which was the result that we wished to show. 


2 February 3, 2022 


Last time, we discussed the odd-town and even-town theorem, which described the maximum number of (odd and 
even, respectively)-size subsets on {1, 2,--- , n}, where any two subsets have an even-size intersection. We found that 
this maximum was n and 2L"/21, respectively, but in both cases we used linear algebra to show the result. 


The next natural extension is to ask any two subsets to have an odd-size intersection: 


Problem 7 
Consider distinct subsets Cy,--- , Cm C {1,2,--- , mn}, such that |C;j| is even for all 1 <i < mand |C; G| is odd 


for all i ~Aj. What is the maximum possible value of m? 


A simple construction we can consider is Cy = {1, 2}, Co = {1, 3}, C3 = {1, 4},---, which gives us m= n—1. (In 
each case, the intersection C;M C; only contains 1.) Furthermore, we can also add the set C, = {2,3,--- , n} as long 


as nis odd. But that’s actually the best we can do: 


Theorem 8 


Under the settings of Problem 7, the maximum value of m is n for odd n and n— 1 for even n. 


This separation between odd and even n may seem surprising to us (linear algebra “shouldn’t be able to tell” the 


parity of 7), but we'll see in a moment where this comes up in the proof. 


Proof. (We have achieved both equality cases by construction.) First, we’ll do the even n case, and we'll again use 
the same linear algebra setup that we've been doing with our earlier proofs. Let v1,---,Vm € F9 be the incidence 
vectors of the sets Ci,--- , Cm, respectively; the theorem statement requires that v.-vj =O if =y and y-vj=1 
otherwise (remember that dot products count the number of common 1s in the two vectors). 

Notice that this time we need to prove m < n—1, so showing linear independence isn’t enough on its own. But in 
this case we only need a little more, essentially because the vectors also can’t span the whole space. In particular, 
if we consider the subspace U = {(x,,--- ,X,) © FS : x, +--+ +x, = 0}, which is clearly a linear subspace, notice that 
U contains the set of vectors with an even number of 1s, and thus this is also the set of vectors u such that u- u = 0. 
So because vj,--+ , Vm are all in U, their span is contained in U, which is strictly smaller than F5. Thus the span is of 
dimension at most dim U < n—1, and now it suffices to show linear independence. 

Indeed, suppose for the sake of contradiction that m > n. Because dim U = n— 1, these vectors must be linearly 


dependent — consider the first n of these vectors. Then we can write 


AiVy +++: + AnVy = 0 


for some Aq,°-- , An € Fo not all zero. If we now take the dot product with v;, we find that 
Sod y= SON =O, 
i iAj 


or equivalently adding A; back in, we have 
Arter tAn = Aj. 


Since this holds for all A;, they must all be equal, and because we assumed they’re all not zero, we must have A; = 1 
for all j. But this is a contradiction if n is even (one side is even and the other is odd), so indeed we can only have 


m< n—1 inthis case, as desired. 


The case where n is odd can now follow in a variety of ways: we can either replace n with n+ 1 in the linear 
independence argument, or we can replace each incidence vector with its complement and apply the odd-town theorem 
from last lecture. But a third method is to take the clubs C; C {1,2,---,m} and consider them as subsets of 


{1,2,--- ,n+1} instead. That doesn't change the size of sets and their intersections (we're just adding a citizen who 


doesn't participate in any clubs), and then by the even case we have m< (n+ 1)—1l=n. 


The last case, where both |C;| and |C; M Cj| are odd, is left as an exercise to our homework (which will be posted 
next Tuesday). We'll just say one more thing about the odd-town theorem before we move on: if we return to the 
classical odd-town theorem, where |Cj| is odd and |C; 7 C;| is even, we can actually restate the proof in a slightly 


different way: 


Alternative proof of the odd-town theorem. As before, let v1,--+ ,Vm € FS be the incidence vectors of C1,--- ,Cm, 
where vj: vj = 1 if / = J and 0 otherwise. The dot product vy; - v; can also be written as the matrix multiplication 
viv (where we treat each vj as an n x 1 matrix). So if we define the n x m matrix A with columns vy,--- , Vm, 
then ATA has (i, /)th entry vJv;, so|A’A=/m| (where Im denotes the identity matrix). But this means that 


I 


m = rank(I) = rank(A’ A) < rank(A) (since the rank of a product is at most the rank of each term), and that rank 


is at most n because A is ann xX m matrix. Thus m < nas desired. 


Presenting this proof is also meant to motivate the next result that we're presenting: 


Theorem 9 (Fisher's inequality) 


Let Ci,---,Cm € {1,2,---,m} be distinct nonempty subsets, such that all pairwise intersections C; Cj (for 


i # J) have the same size. Then m< n. 


Proof. We can achieve m = n either by setting C; = {i}, or using the construction from Problem 7, or using all 
subsets of size (n — 1). To show the inequality, first say that |C;M C;| = t for some integer t, meaning that |C;| > t 


for all /. We split into two cases (assuming m > 2, since m= 0, 1 Is clear): 


+ Suppose |C;| = t for some j — without loss of generality, suppose / = 1, and notice that in this case t > 1 
because our subsets are nonempty. Then each of the other sets C; must contain C, so that C; MC, contains t 
elements, and in fact if C; C Cg, Cp, then 


Cy CONC, forall2<l<h<m. 


Now because C, and Cy, C; both have t elements, that means CgM Cy = Cy for all 2< 2< h< m. Thus the 
(m—1) sets CoM Cy,--+,CmM Cy are disjoint subsets of the set {1,2,---,nm}\ Cy, and they are nonempty 
because the Cjs are distinct. Thus there are at most |{1,2,--- ,n}\ Ci| = 1 — |Ci| of these sets of the form 


C;M Cy, and therefore m—1<n—|Ci| <n-—1 = m<n, as desired. 


- Now suppose that |C;| > t for all 1 <j < m. Here's where the linear algebra comes in: we again construct 
incidence vectors V1,--- ,Vm € {0,1}”, but this time because we don't have parity restrictions we will consider 
these vectors living in R” instead. Then we find that y- vj = t for / AJ and v;- vy > t for all /, meaning that if 
we form the n x m matrix A with columns v4,-°-: , Vm, then A’A has off-diagonal entries t and diagonal entries 
|C;| > t. Thus, we have 

ATA=tJ+D, 


where J is the all-ones matrix and D is a diagonal matrix with strictly positive entries. But again rank(A’ A) < 


rank(A) <n, so it suffices to show that our matrix A’ A has full rank. Indeed, this is because our matrix is 


positive definite (in other words, x’ (A’ A)x > 0 for all nonzero x € R™), since tJ is positive semidefinite and D 
is positive definite; more explicitly, if x is the column vector with entries (x1,--- , Xm), then (viewing expressions 


of the form x! Mx as bilinear form calculations) 


m m 
x! (AT A)x = x! (tl)x +x!’ Dx =t y XixXj + SGI = *)x-. 
ij=1 i=1 
But these terms simplify because the first term becomes t(x, +--+ + ine > 0, and the second term is a sum 
of strictly positive terms unless the corresponding x;s are zero. So the only way for x’(A’A)x < 0 is if x; = 0 
for all i, as desired. So A’ A has full rank m (otherwise we would have a nonzero vector x in the kernel, which 


would have implied that x’ (A A)x = 0), and thus m < n, completing the proof. 


3. February 8, 2022 


Last week, we discussed theorems about families of sets with certain rules about their intersections. We'll discuss 


another result of this type today, but we'll now consider two different families of sets together: 


Theorem 10 (Skew Bollobas’ Theorem) 


Suppose Aj,--- , A, are sets of size |A;| = r, and B,,--- , B, are sets of size |B;| = s (with no restriction on the 


ground set that they live in). Suppose that A; Bj = © for all 1 <i <n, but ANB; AS forl<i<j<n 
(note that this is not the same as i #j). Then n< (‘T°). 


This is pretty different from the other results we've been seeing, specifically because there's no restriction on what 


elements our Ajs and Bjs can contain, and yet we still have an upper bound on n. 


r+s 


) and let the Ajs be the set of all r-element 


First, we show that this bound is actually sharp: let n = (‘'°) = ( 
subsets of {1,2,---,r+ s}. If we then define Bj = {1,2,---,r+s}\Aj, we know that A; Bj = @, and we also 
have A; B; # @ because there are only r+s total elements (so a set of size r and a set of size s can only be disjoint 


if they are complements, and A; 4 Aj). 


Fact 11 
Notice that our theorem statement only required A; B; 4 @ for i < J, but in fact this construction achieves the 
condition for / > / as well. Bollobas only proved this theorem for the case where / 4 J, and that result turns out 


to be significantly easier — this “skew" version with / < J is what we'll show now. 


Before we do the actual proof, we'll begin by thinking about the moment curve in IR?, which is the set 


Lemma 12 


Any d vectors on this moment curve are linearly independent. 


XL x2 Xd 
Proof. Indeed, if we have our vectors , ; vote, ; , we can consider the determinant of the matrix 
d-1 d-1 d-1 
xy Xo Xq 
formed by putting them together, 
1 1 1 
xy X> ae 
det ; 
x ee Sie 


This is the Vandermonde determinant, and It turns out this expression Is equal to Hej — x;) (by considering the 
degree of the polynomial in the x;s and using the factor theorem). And because we assumed that we had distinct x;s, 
this determinant is indeed nonzero. 


But if we’re not satisfied with that proof, here’s another way to think about it: suppose this determinant is zero, 


so that the rows are linearly dependent as well. Thus, there exist constants ag,--- , ag_1 so that 
1 x1 xf ao + ax, +++ + ag_1 xe" 
1 X5 ie ag + aX +++ + ag_1 xf" 
0 = ao +a, +++++ ag-1 F = 
d-1 d-1 
1 Xd xq 40 + a1Xq + +++ + ag—1Xq 
In other words, x;,--- ,Xq are all roots of the degree at-most-(d — 1) polynomial aj + ax +--+: + ag_1x?, and 


because the xjs are distinct this can only occur if the polynomial is the zero polynomial. 


We'll also need some additional linear algebra tools to solve this problem, namely the concept of an exterior algebra 


(which might take some time to get used to): 


Definition 13 


Let W be a vector space over R (this also works over other fields of characteristic not equal to 2). The kth 


exterior power of W/, denoted A*X(W), is the quotient of the tensor product W@W @--- @W (k times) by 
relations of the form 


Z1@+++@Z-1@X @V @ F411 @ + OAH“ Os @Z1@V OX @Zi1 @ +++ @zZX, 


for all 1 <j <k—1andx,y,z EW. 


In other words, we start with the tensor product W®* — luckily, because we're tensoring over a field rather than a 
ring, things are pretty simple. Specifically, we can write down a basis of W®* by fixing a basis of W and then looking 
at “pure tensors” of the form Z, ® Z ®---® Zz, where each z; is a basis element of W. Then we can take arbitrary 
linear combinations of such pure tensors, and that gives us all of the elements of Wk, 

From there, the exterior algebra is formed by saying that if we flip the order of two adjacent elements, then we get 


the same element back but with a negative sign. 


Fact 14 


Let w1,--- , We bea basis of W. Then elements of the form 


Wa A Wp Av A Wal Sf <i be <p 


which are the images of w;, @ w;, ®--- @ wj, after quotienting, form a basis of Ak(W). 


(We ask that the j; indices are strictly increasing to avoid linear dependence due to switching — in particular, if two 
indices are the same, then swapping until we get those indices next to each other shows us that the element is its own 
negative by swapping again, meaning that we just have zero.) Because of this fact, we know that dim A‘(W) = haga 


, 


and thus if k > dimW the exterior power is Just the zero vector space. More generally, we have the following fact: 


Fact 15 


For any w1,--: , We € W, we have w, A Wo--: A We #0 if and only if the wjs are linearly independent in W. 


With that, we're ready to return to the original proof: 


Proof of Theorem 10. Suppose we have our sets Ai,--- , An, B1,--- , By — these are all subsets of some (potentially 
large but finite) set of size M, which we'll label {1, 2,--- , M}. (Even if we had an infinite set of Ajs and Bjs, this proof 
argument would still allow us to follow this argument by truncating those infinite sets at, say, CS) + 1 elements.) 
Let w1,--- , Wy be M distinct vectors on the moment curve in R'**; by Lemma 12, any set of (r+) of the vectors 
W1,-+* , Wy will be linearly independent. (This linear independence for any set of (r+ s) vectors is actually all we need 


from the moment curve.) We now define 
a= Wi, A---AWy, € A'(R'TS), where j;1 <-+: < Jj are elements of A; 


in other words, we wedge the elements of A; together in increasing order (so if Ay = {2, 3,5}, then a; = woAw3A we). 


Similarly, we also define 
by = wy, A+++ A wy € AY(RT), where ji, <-++ <Jj., are elements of Bj. 


But now we can form aj \ bj € A’TS(R'T®) for any 1 < i,j < M (because a; is always a wedge of r elements, and 5; 
is always a wedge of s elements). Then a; A b; 40 for any /, because we're wedging together (r + s) vectors from 
the moment curve, which are linearly independent and then we use Fact 15. On the other hand, a; A bj = 0 for any 
i <j because of our repeated element (which means we don’t have linear independence). This is the point where it’s 
important that our w; vectors live in R'T®! 

So now notice that a1,--- , an € AT(R'*S) are all vectors in the ("f*)-dimensional vector space, so to show that 


n< CS), it suffices to show that the ajs are linearly independent. Indeed, suppose we have some linear combination 


of the form 

Ajay +--+ + Anan = 0 
in A’(IR'tS) for some ; € R. We can now wedge with bn, by—1,-*+ , by in that order to show that Ap, An—1,°-* Ar are 
zero, but more concisely we can let j € {1,--- , nm} be the maximum index with A; 4 0. If wedge our equation above 


with bj, we then find that 
(A121 + -++ + dnan) \ bj a Oi 


We can now distribute (because tensoring is multilinear, so is wedging), so that this equation becomes 
Aiay A bj) +--+ + Anan A bj = 0. 


But now for / < j, aj \ bj = 0, and for s > J, A; = 0. So the only term that remains is Ajaj A bj, and thus 


because aj A bj #0) A; = 0, a contradiction. Thus no linear combination exists, the ajs are linearly independent, and 
ij /\ Dj vj 
n< (Ps) 


Remark 16. On our homework, we'll see that we can also do a two-family version of the odd-town theorem, and we'll 
see how probabilistic considerations show up there. But for this skew Bollobas result, only linear algebra proofs are 


currently known (though there are others which only use the moment curve or only use the exterior algebra). 


Fact 17 


Last week, we tried flipping the roles of “even” and “odd” in the even- and odd-town theorems, but that won't give 


us anything interesting for the skew Bollobas result — in all three other modifications of the theorem conditions, 


we can easily have infinitely many sets. So that’s another reason that we should see this result to be surprising! 


4 February 10, 2022 


We'll move on to a different question (of a geometric flavor) today, starting with a warmup version: 


Problem 18 
What is the maximum number of lines in R? through the origin, such that the angle between any two of them is 


the same? 


If we have two lines at an angle of a apart, then adding a third line must require that line to be at an angle of a 


from both of the first two lines. This can be achieved by having three lines at 60 degree angles from each other, and 


indeed we can check that | 3] is the maximum. (The reason that we ask the lines to be through the origin is to avoid 


infinitely many parallel lines.) We'll now move up by one dimension: 


Problem 19 


What is the maximum number of lines in R? through the origin, such that the angle between any two of them is 


the same? 


Two ways that we can again get 3 lines through the origin is to consider three pairwise orthogonal lines, or to use 
the construction from R?. And one way we can get 4 lines through the origin are to take the vertices of a tetrahedron 
centered at the origin and connect them to the center (or equivalently to do the same thing with a cube — that’s the 
same construction but potentially easier to visualize). 

It is then natural to try using other Platonic solids to get a similar symmetry, and indeed we can do this by taking 


the longest diagonals of an icosahedron: 


Since icosahedrons have 12 vertices, and connecting opposite vertices passes through the center, this gives us | 6 


lines. And indeed, this gives us equiangular lines because to get from any diagonal to another, we look at one endpoint, 
take one of the five adjacent vertices, and form the diagonal from that vertex. And this is in fact the best that we 


can do: 


Theorem 20 
d+1 


There are at most ( 5 ) lines through the origin in R@ such that the angle between any two of them is the same. 


Proof. Suppose we have a set of equiangular lines £1,--- , 2, all separated by some angle @ € (0, 4]. We aim to prove 
thatn< ear and we'll do so using linear algebra. For all 1 << n, let vy, be a unit vector in the direction of 2; — 


there are always two choices, but we can arbitrarily pick between them. Then based on our restrictions, we have 


1 i=J, 
tcoshd iF J, 


because the dot product is the length of the projection of vj onto vj, where the + sign depends on the direction of 
the line that we've chosen in each case. This time, the strategy is not to show linear independence of the vectors 


v; (that won't work) — instead, for each 1 < i < n, we consider the d x d matrix viv. This is symmetric because 


(vii vi)’ = (vw) (vi)? = vi vj, and we can notice that the dimension of the space of symmetric d x d matrices 
isd+(d—-—1)+-:--41= aary) = eae Thus, tt suffices to prove linear independence of the matrices 
ViVi, VoVe .*** Vp Vp. 


Indeed, suppose we have some linear combination of those matrices satisfying 


n 
) Aviv =0 
f=] 


(note that this is an equality of d x d matrices). To turn this into an equality of numbers, multiply by vi from the 


left and vj from the right: since v/v; = v;- vj, we in fact have 


n 


n 
So div viv vj =) 0% => S- (vi: vj)? = 0. 


i=1 i=1 


But now the + ambiguity in the dot products goes away, and we're left with the relation 


dj +cos? 6 $0 Yj =0 
(Al 


for any Jj. In particular, this means that 


cos? 6S > w= (cos = 1) A, 


i=1 


10 


and because 0 < ¢ < 4 this can only hold if all Ajs are equal. Since the left and right sides of this equation will always 


have opposite sign, this will only occur if A; are all zero. Thus linear independence is shown, and n < Cok 


This bound is tight for d = 2,3, but it is not tight in general — nevertheless, we do have a lower bound which is 
also quadratic in d. So the true answer is known up to a constant factor, which is often the best we can do in extermal 


combinatorics. 


Fact 21 


On the other hand, this flavor of question can be very difficult — those lower bound constructions all have common 


angle tending to 90 degrees, and If we instead ask the related question of “what is the maximum number of lines 


in R¢ with some fixed common angle a?”, the answer is instead at most linear in d, and refining those bounds is 


ongoing research (partially done by some of the students in this room!) 


Notice that our n lines v14,--- , Vp give US n points on the unit sphere, and those points have the special property 
that there are only two possible distances between them (depending on the orientation of the lines). That motivates 


the next question of point-sets with only two distances, and we'll again start off with a preliminary question: 


Problem 22 


What is the maximum number of points in R® such that any two points are the same distance apart? 


The answer is | d+ 1}, formed by taking a regular d-dimensional simplex — finding a nice way to prove this (such 


as with linear algebra similar to Fisher's inequality) is on our homework. So such a situation is extremely constrained, 


and we'll relax the constraints to get an interesting problem now: 


Problem 23 


What is the maximum number of points in IR? such that any two points are one of two distances apart? 


Notice that we are allowing any two distances in this problem, while in the equiangular lines case the two distances 
were actually related to each other by geometric considerations. But it turns out that the answer even with this extra 


freedom is the same to leading order: 


Theorem 24 


Let p1,---,Pn € R® be distinct points such that there are only two distinct distances between them. Then 


n<3(d?+5d+4). 


In this proof, we'll see the introduction of a cool technique which will come up more in future lectures as well. 


Start of proof. Let the two distances between the points be a, b 4 0 (if there’s only one distance then we can let b be 
some arbitrary positive number). We'll be showing linear independence again, but this time we need to be a bit more 
clever with the objects for which we are proving that linear independence. 


Specifically, for all 1 <j <n, consider the function f : R?>R given by 
f(x) = (|| — pj? — 2°) — Byl|? — B°). 


(Here, ||x — pj|| denotes the distance between the points x and pj.) This is a real-valued function satisfying 


f(p;) = 0 for all i 4 j| (since the distance between p; and p; is either a or b) and | (pj) = a*b? |. 


11 


We now claim that the functions f,,--- , f, are linearly independent. Indeed, suppose we have Ai fi +---+Anf; = 0 
(this is an equation of functions); evaluating both sides at p; makes most terms disappear by the boxed relations above, 
so we're just left with A;fi(p;) = dja? b? = 0, so that >; = 0. Repeating this for all i shows the claim. 

The space of functions from R®@ to R is infinite-dimensional, so the dimension considerations we've been doing 
so far are not directly useful for us yet. But now we can take a closer look at f(x) and understand why we chose 
functions of that form — notice that the fs are all polynomials in the d coordinates x = (%,--- ,Xq), because 
Ilx — pjl|? = (x; — pl? expands out to a sum of squares in those coordinates. So all fs are in the (much smaller) 
space of polynomials in d coordinates — that space is still infinite-dimensional, but next lecture we'll continue to narrow 


down some properties of f; (for example, they all have degree at most 4, finally giving us a finite-dimensional vector 


space) and finish the proof. 


5 February 15, 2022 


Last lecture, we considered sets of points where there are only two distinct distances among the points, and we started 
proving that there could only be at most 3(d? +5d +4) in R® (quadratic in d, while when we only get one distinct 


distance we get only linear in d). We'll continue the proof here: 


Proof, continued. Recall the main steps from last time: for each point p; in our set, we defined a function 
2 2 2 2 
fi(x) = (|x — pill? — a) (Ix — pill? — 6°), 


and then we showed that the different functions f{,--- , f, are linearly independent. (because f(p;) = 0 except when 
i = j). We mentioned last time that the space of all functions R? — R is infinite-dimensional, so linear independence 
doesn't seem to help yet. But the next step is to reduce the size of the subspace under consideration — we saw that 


because we're taking squared norms of distances, fj is actually always a polynomial, and it has degree at most 4. Since 


d+4 
4 


To get a more refined bound now, we'll expand out the squared norms: 


that space has dimension ( ), we get a quartic bound for n, and it remains to narrow down the bound further. 


fi(x) = (||x||? — 2p; - x + ||pi||? — a) (Ilxll? — 2p; - x + |pil|? — 6?) . 
We could expand out all 16 terms, but we can look at the terms in order of degree of x = (x1,--- , Xq): 


= ||x||\* — 4(p; - x)||x||? + (some degree in (x;,--- , xy) of degree < 2). 


Here, we should remember that ||x||? = x7 + ---+ x3 is indeed quadratic in the xjs. Specifically, this polynomial is 


actually 


n 
fi(x) = Of +--- +x)? — >> ph”) x(x? +-++++x5) + (some degree in (x1,--- , xq) of degree < 2), 
i=1 


(k 


where p; ) is the kth coordinate of the point p;. Thus, f,--- , f live in the span of the following polynomials — we 


have to take all monomials of degree at most 2 to get the unspecified degree-2 part, and then we need to account for 


the linear combinations that we can get from the leading terms: 
{1 xj. xP, MH OP Ho $29), HOG te tg) TS LAS di AJ}. 


That means f,,--- , f, are linearly independent elements of a vector space of dimension at most (just adding up how 
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many elements we have in the spanning list) 


| | ut? hs | _l 2, | 
Ltdtd+ (5) +14d=5(? +5044), 


and thus n < 3(d? + 5d +4), as desired. 


Remark 25. /t turns out we can improve on this proof, because we forgot about the information in the degree-2- 


a) 


polynomial part: it turns out we can have n < ( > )= $(d? +3d+2), and that’s going to be a homework assignment 


for us. 


It’s natural now to ask about lower bounds (in particular trying to construct such a point-set): one method is 
to take the set of vectors in the Boolean cube {0, 1} with exactly two ones. Then n = (2) = 4(d? — d), and the 
distances between the points are either 2 (if there are no overlaps between which indices are 1s) or /2 (if there is one 
overlap). 

We can further improve on this by noticing that the construction above actually has all points within a hyperplane 
X1 +++: + xq = 2, so it essentially lives in R¢-? instead. So we can replace n < (3) with n < (*5") = 4(d? + d) by 
doing a “back-construction,” doing this in R¢+? and then looking at the affine hyperplane (‘“affine’ meaning that the 
hyperplane doesn't need to go through the origin) isomorphic to R® containing all of our points. (Equivalently, we can 
consider a simplex of (d + 1) points and connect midpoints of edges — then the only two cases are if the edges are 
adjacent — equilateral triangle — or not — tetrahedron.) And that’s (probably) the best-known construction so far. 

That's all we'll say about point sets of two distances for now — the key point was to do linear independence on certain 
smartly-constructed functions. And we'll use this polynomial technique to go back to the skew Bollobas theorem and 
prove it in another way not involving exterior algebras. As a reminder, the statement is as follows: suppose Aj,--- , Ap 
are sets of size |A;| = r, and Bi,--- , Bp, are sets of size |B;| = s (with no restriction on the ground set). Further 
suppose that AjM Bj = @ for all 1 <i <n, but A;N Bj 4B forl<i<j<n. Thenn< (‘t). 


Alternative proof of skew Bollobas. Like in Lecture 3, we'll need our lemma about the moment curve, which says that 
any d distinct points (1, x,--- ,x¢~+) in R@ are linearly independent (by computing the Vandermonde determinant). 
Fix notation so that Aj,--- , An, By,--- , By are subsets of {1,--- , M} for some positive finite M, and pick M distinct 
points P1,--* , Py on the moment curve in R'++ (note that this is different from picking points from the moment curve 
in R’*S like we did last time). Then any (r + 1) points among p1,--- , py are linearly independent. 

The functions we write down this time are slightly more annoying: if B C {1,---,M} is an arbitrary s-element 


subset (such as B,,--- ,B,, but we want to simplify notation), we define fg : R'+! > R via 


fa(y) = [[(ei- y). 


i€B 
This is a homogeneous polynomial of degree s in y = (y1,°-+ , ¥p41) (because each term p;-y is a linear combination of 
the y;s, and there are |B| = s of them). Thus, fg is always in the space of homogeneous polynomials in Rly, +--+ , ¥r+1] 
of degree s, which has dimension ag) by stars and bars. So now if we can show that fg,,--- , fg, are linearly 


independent, we'll have the desired bound n < ("**). 

To show linear independence, recall that with the point-set proof, we plugged in particular points into our polynomials 
such that f(pj) = 0. Similarly, we want to ask here when fg(y) = 0, which occurs if and only if y is orthogonal to 
p; for some / € B. Unfortunately, the way we set this up, p; and p, are likely not orthogonal, so we have to be more 
careful. For each 1 < / <n, associate a vector qg; with the set A; as follows: because A; has r elements within 


{1,--- , M}, that gives us r linearly independent points p, in R'*+ among p1,--~ , Py, and now q; can be some vector 
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orthogonal to all r of those points (in other words, take the r-dimensional hyperplane formed by the points px, and 


take a vector orthogonal to that plane). We can then notice that 


0 LEA, 
#0 L€Ai, 


because if 2 € Aj, then pg is one of the points in the hyperplane that g; must be orthogonal to, and if (for the sake 
of contradiction) 2 ¢ A; but qj - pg = 0, then py would be in the hyperplane spanned by the pxs for k € Aj, and that 
would be bad because it would be (r + 1) points on the moment curve that are linearly dependent. So now we have 
the orthogonality relations that are necessary for continuing on with the proof: evaluating our functions at the points 
qi, we have 


fe.(qi) =0 = > qi-Pe=0 for some £E By = > £E€ A; for some LE B,, 


Se) fe.(qi) = 0 if and only if A; 7 B; 4 @. We can now finally use the properties of A; and B; from the statement of 
skew Bollobas: 
#0 i= J, 
fe,(qi) = 40 pes 
(unknown) i>yJ, 


and just like last lecture this is enough to show linear independence. After all, if we have some nonzero linear 
combination \ifg, +-:: + Anfg, = 0 (this is an equality of functions R¢** — R), plugging in the smallest i such 
that A; = 0, we have Ay = --- = Aj_1 = 0, and also fg,,,(q;) = --- = fp,(q;) = 0. So all the terms go away except 


Aife,(qi), which is nonzero because we assumed A; 4 0 and we know fg,(qi) 4 0, which is a contradiction. Thus linear 
ae 
r 


independence is shown, and n < ( , the dimension of the space we've been considering for the fgs. 


6 February 17, 2022 


We'll discuss a result by Frankl and Wilson, titled “medium-size intersection is hard to avoid.” The result is interesting 
on its own, but we'll also use it later to find a counterexample to a geometric conjecture. Much like other results 


we've seen in this class, this is a statement about sets and their intersections: 


Theorem 26 (Frankl, Wilson (special case)) 
Let p be a prime, and let F be a family of subsets of {1,--- , n} each of size (2p — 1). If no two members of F 


intersect in exactly (p — 1) elements, then |F| < (f) + (7) +---+(,7,)- 


1 p-1 


n 


Notice that the total number of subsets of {1,--- ,n} of size 2p — 1 is boy ), which is much larger than CG + 


1 0 

(a) tere (ow) for large n (because the former is a polynomial of degree (2p — 1), while the latter is a polynomial of 
degree (p — 1)). We can also get a lower-bound construction to see that this bound is asymptotically tight (for large 
nand fixed p): we can get |F| = Ca by taking F to be the set of all subsets of {1,--- , n} that contain {1,--- , p}. 


(Then the intersection of any two subsets has size at least p > p— 1.) 


Proof. We'll again do a linear independence proof based on cleverly constructed functions. For each set A € F, we 
once again define its characteristic vector va € {0,1}” (1 in the /th index if A contains /), and we will also define 
a function f4 which takes vectors x in {0,1}" as inputs. Since our theorem statement has to do with intersections 


between sets, we want to sum up the entries of x in the indices of A (equivalently, do our usual dot product argument), 
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and furthermore we want to be able to make linear independence arguments using the fact that fa(vg) = 0 for A# B 


(like we've previously done). This seems to motivate the definition 
? 
s€{0,--- ,2p—1},sAp—1 icA 


but the degree of such a polynomial turns out to be too large for us to get the bound that we want. So instead, we'll 


only multiply over the smaller values of s: 


Ae )-] 


s=0 icA 


and to compensate for the fact that f4(vg) doesn’t seem to vanish if the intersection between A and B is too large, 


we'll have f, take values in F,. 


Remark 27. Notice that it's equivalent over F, to define this function as 


a(x) = (x») —(p- »| —1, 


icA 
basically by using Fermat'’s little theorem. 

We can indeed see that f4(x) is nonzero if and only if all of the factors in the product are all nonzero in F, — since 
the product runs over all residues mod p except one of them, the only way for fa(x) 4 0 is if oieqxX) = P—1 in 
F,. Thus, | fa(vg) 4 0| if and only if }oje4(ve); = p— 1, if and only if |AM B| = p—1 in F,. And by the theorem 
statement and the fact that |ANM A| = 2p — 1 =p-—1 in Fp, this indeed occurs if and only if | A = B |. 


Now, the functions {f4(x) : A € F} are linearly independent for the same reason as usual: for any linear combination 
Ya Aafa = 0 (as an equality of functions {0, 1}” — F,), if we evaluate both sides at vg, we're left with Agfg(ve) = 0, 
so that Ag = 0 for all B. So now it suffices to calculate the dimension of the space of functions that the fas live in — 
the dimension of the space of functions {0,1}” > F, is 2”, but |F| < 2” is a useless bound. 

Instead, notice that all fas are polynomials, and their degrees are all p — 1, so we can consider the subspace of 


n+p—-1 
p-1 


not quite good enough. But there’s a subtlety here; remember that our domain is {0, 1}", so that xe is actually the 


all polynomials of degree at most p— 1. This space looks like it has dimension ( ) by stars and bars, which is 
same polynomial as x;. So powers of 2 always disappear, and over the space we're considering, the dimension of the 
subspace is indeed just (3) + G fees (ew) (because for each degree d, we pick d different elements of {x,,--- , Xn} 
to multiply together to form a monomial). And we can take our functions fy, multiply them out, and then remove the 


higher exponents from the various xjs — this doesn’t change the function and keeps the degree at most (p— 1). Thus 


|F| must indeed be smaller than this dimension, proving the result. 


(We can ask about generalizing this statement to non-prime p or more general sizes of sets and intersections — 
we might talk more about this next lecture.) With that, we'll now turn to the geometric application, starting with a 


somewhat unmotivated lemma that will be used as a blackbox: 
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Corollary 28 


Let p be a prime, and let n = 4p. If F is a family of (2p — 1)-element subsets of {1,--- ,n} = {1,--- , 4p}, such 


that no two members of F intersect in exactly (p — 1) elements. Then 


In other words, we get an exponentially small bound on the fraction of total (2p — 1)-element subsets we can have 


at once. 


Proof. By Theorem 26, we know that 
4p 4p 4p 
< ee 
ris (a) +(p-o) (9) 
2 p 
2 4p 1 | 1 oor 1 . 
~ Lp 3 3 3 


ale 
n—k-+1' 


always have n > 4k here. Evaluating the geometric series, we find that 


because inductively, going from (2) to any requires multiplying by which is always at most $ because we 


1 (4p JFL. 2p) _ 1@p-1)!2p +1)! 
ri<3(%) GP) Ge) 2 Geller 


Expanding this out and canceling factors, we find that 


|F | (2p —1)(2p — 2)---(p +1) 1 /2\?} 


2p—1 


because each corresponding term in the numerator and denominator have ratio at most 3. Thus we indeed find 


p 


(op°) 


We can now present the geometric conjecture that we've been alluding to: 


Definition 29 
The diameter of a set X C R®, denoted diam(X), is given by 


diam(X) = sup ||x — y|]. 
x VEX 


Conjecture 30 (Borsuk’s conjecture) 
Every subset X € R® with finite diameter can be partitioned into (d + 1) subsets X;,-*+ ,Xg41 such that 
diam(X;) < diam(X) for all /. 


This conjecture is clearly false if we use fewer than d+ 1 subsets, because we can take a simplex of (d +1) vertices 


of side length 1 — then the diameter of the set is 1, but the pigeonhole principle tells us that if we partition into d 
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sets, one of those must have at least two points, and thus the diameter of that set would be 1. For historical context, 
Borsuk first considered the unit ball, in which he proved that partitioning it into d subsets of smaller diameter was 
impossible but that (d +1) subsets was possible. And this conjecture was not directly by Borsuk, but it was motivated 
by his result and quite widely believed (it was proved for dimension d = 2,3 and also for smooth convex sets in R? — 
note that the convexity is not the issue, because the diameter of a set is the same as the diameter of its convex hull). 


But here’s how the conjecture was disproved (in fact, shown to be very false): 


Theorem 31 (Kahn and Kalai (1993)) 


Let p be a prime and n= 4p. Then there exists a set X C R”™ of finitely many points (so finite diameter), such 


that for every partition of X into subsets X1,--- , Xm for m< 1.1”, we have diam(X;) = diam(X) for some X. 


This result says that we can take exponentially many subsets (with exponent proportional to the square root of 
the dimension of the space) and still not be able to reduce the diameter! In particular, whenever 1.1” > n? + 1 (for 
example, n > 100 and n is four times a prime), we get a counterexample to Borsuk’s conjecture. On the other hand, 
we can show that we can always reduce the diameter using exponentially many subsets (with a larger base), but we'll 
talk about that next time as well. 

We'll do the preparatory steps for the proof and show the result next time. As a spoiler, we'll be looking at all 
(2p — 1)-element subset of {1,--- , 4p} and associate it to a point in X. What's interesting is that we're in dimension 
n? here, and we'll do that with tensor product notation: for x,y € R”, we let x @y € R™ have entries Xiyj for 
1 </,j <n (in other words, consider the entries of the n x n matrix xy, but we think of this as a vector in R”), 


The claim is now that for any x, y € R", we have 
(x @x)-(y@y) =(x-y)?, 


where the dot products are in R™ and R" on the left and right sides, respectively. Indeed, this is just a computation: 


n n n n n n 
(x®x)- (VEY) => > (K@x)ily @Y)i => YO XIV = DO XIV DS XI = (KY) Y). 

j=1 j=1 j=1 j=1 j=1 j=l 
Next lecture, we'll take the (2p — 1)-element subsets and associate to them a vector in R”, and that will allow us to 


contradict the medium-size intersection result if we have a set with too many elements. 


7 February 24, 2022 


Last lecture, we started constructing a counterexample to Borsuk’s conjecture. As a reminder, the conjecture claims 
that a set X C R® of finite diameter can be split into (d +1) pieces each with smaller diameter than X (motivated by 
the arguments for breaking up a d-dimensional ball or a simplex), and what Kahn and Kalai showed is that for n = 4p 
(p prime), we can construct a finite set of points in IR”, such that every partition of X into m < 1.1” pieces, one of 
the pieces must have the same diameter as the original set X. (So as long as 1.1” > n? +1, the conjecture is false, 
and in fact asymptotically the conjecture is extremely incorrect.) 

To start the proof, we mentioned last time that if x, y € R” are two vectors, we can form a vector x®@y € R™ whose 
entries are xjyj for 1 < i,j <n (we're not thinking of this as a tensor product, though the entry-wise multiplication 
is motivated that way). In particular, we noticed that (x @ x) -(y @ y) = (x- y)?, and we'll now use that to find our 


finite set of points disproving Borsuk’s conjecture. 
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Proof. Our construction works as follows: remembering our result about medium-size-intersections of sets (specifically 
(2p — 1)-element subsets of {1,--- ,4p}), we'll let A be the family of all (2p — 1)-element subsets of {1,--- ,4p} = 
{1,---,n}. We have |A] = (osu) = (on" i) and now we will associate to every subset in A a point in X C R”. 
Since the characteristic vector is n-dimensional, we'll want to use the “tensoring” operation we mentioned above. 
Furthermore, because the characteristic vector has a lot of zeros (which creates lots of zeros after tensoring), we'll 
make a slight modification to it as well: let ug € {1,—1}” be the signed characteristic vector, meaning that u, has 
1 in the jth index if j € A and —1 otherwise. Then for each A € A, we define 


PA = Ua ® ua E R™ 


(meaning that the (/,/) entry of py, is 1 if /,/ are both in or both not in A, and —1 otherwise), and we'll let X be the 
set of all of these points: 
X={pa: AEA}. 


We claim that |X| = |A| = ane and to show that we need to prove that the py,s are all distinct. Indeed, 
(Pa)ij = (ua)i(ua)j, So just looking at the (1,/) entries tells us the vector ug up to a sign (specifically the sign 
(ua)1). And furthermore, ug will have (2p — 1) 1s and (2p + 1) —1s, so we can then figure out the sign by counting 
up the number of 1s in the (1,/) entries of pa. Thus pa uniquely determines ug. (This wouldn't have worked if we 
had (2p)-element subsets — A and A‘ would give us the same pa, for example.) 

We now wish to prove our partition property, and to do that we must understand the diameter of X, meaning we 


need to do some work to understand how distances behave in X. Let A,B € A. Then 


||P — Pall? = (pa — Pe)(Pa — Pe) 
= Pa- Pa 2pa- PB + PB° PB 
= (Ua @ ua) - (Ua ® Ua) — 2(Ug @ UA) (UB ® UB) + (UB ® UB): (UB ® UB), 


and now we can use our identity (x @ x) -(y @ y) = (x- y)? on every term here to get 


(ua + Ua)® — 2(ua+ Ug)* + (UB: UB)? 


= n? — 2(u,- ug)? +n? 


(here we must remember that all entries of ua and ug are in {—1,1}, not {0,1}). Finally, ua- ug has entries of 1 for 


elements in either both or neither of A and B, and entries of —1 in the symmetric difference, meaning that 


ua: up |= |AN BI] +|{1,--- 2} \ (AU B)| —|A\ B] -|B\ Al = 0 —- 2|A\ Bl — 2|B\ Al 


(we can draw a Venn diagram to check this, for example). And now because |A \ B] = |A| — |AM BI, and |A] = 2p—1 


(similar for B as well), we arrive at 


= n—2((2p — 1) — |AN BI) — 2((2p — 1) — |AN BI) = 4p — 4((2p — 1) — JAN BI) = |4(JAN Bl — (p— 1)) | 


Substituting this back into the norm calculation, we find that 


\|Pa — pall? = 2n? — 32[|AN B| - (p— 1° | 


and thus the distance between two points is only dependent on the size of |AM B]|. Furthermore, we now know the 
diameter of X — the largest possible distance occurs when [|AM B| — (p— 1)]° is minimized, and this occurs when 


|AM B| = p—1 (corresponding to a squared distance of 2n?). 
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So now suppose we have a partition of X = {pa : A€ A} into sets X1,--- ,Xm for m< 1.1". Then there is some 
part X; such that |X;| > xl > (ost) (for example, pick the largest part). And now we can use (the contrapositive of) 
our corollary from last lecture: let F be the set of As such that pa € X;, so that X; = {p,a: AG F}. Then Fisa 


family of (2p — 1)-element subsets of {1,--- ,n}, and because Or) = an > 1.17", there must be two members 
2p-1 2p—-1 


of F that intersect in exactly (p — 1) elements. Then the two corresponding points pa are separated by the maximum 
distance in X, and thus (because X; is a subset of X) diam(X;) = diam(A), as desired. 


We've thus proved that not only is Borsuk’s conjecture false for d+ 1 subsets, it’s also false if we try to break into 
1.1%? subsets. It makes sense to ask whether there is a version of Borsuk's conjecture that can be salvaged, and the 


answer is yes: 


Proposition 32 


Every set X C R? of finite diameter can be partitioned into at most 77 subsets such that each subsets has strictly 


smaller diameter than X. 


Proof. Let X be a subset of R@ with diameter t. Find a maximal set of points g1,--: , dm € X of pairwise distance 


at least 5 (meaning that all points in X are of distance at most 5 to one of those points qj), and letting X; = 


{x EX :||x— ql] < $}, we have X = X, U---UXm. This is not a partition yet, but we can just make the sets 


smaller until they have no overlap — this only potentially decreases the diameter. Then Xj; has diameter at most 


gt < t = diam(X), so it suffices to prove that m < 77. Indeed, the balls of radius é around the qjs are disjoint 
5 and their union is contained within a ball of radius 
t+ 5 = a from some fixed point x € X (because all q; are distance at most t from x, and then points in the balls of 


radius é can only be a further é away by the triangle inequality). Since the volume of the big ball is 7% times larger 


because the qjS are separated by a distance greater than 


than the volume of the small balls, and the small balls are disjoint and contained in the big ball, this shows m < 74 as 


desired. 


Fact 33 
The next natural question is to ask for the best function f(d) that we can put in Borsuk’s conjecture and still have 


it be true — we have a lower bound of 1.1Y4 and an upper bound of 79 (in fact 57 + 1 if we optimize the constants 


in the proof). The best known upper bound is ,/1.5 + o(1) , and the best known lower bound (that is, how many 


subsets are necessary) is (1.225--- ye, So the only additional information we know is some improvements in the 


bases of the exponentials — there ts still a huge gap in what is known. 


Remark 34. Last lecture, we also had some questions about how important it was for p to be prime in the medium- 
size-intersections problem, as well as whether we needed the forbidden intersection size to be “middle.” In fact, all that 
really mattered in our proof was how many residue classes mod p are forbidden, so we could have also not allowed 
either intersections of size 3 or (p + 3) for our (2p — 1)-element subsets, for example. It all basically comes down to 
modifying the proof from last lecture whenever we're working mod p. 

Meanwhile, if p is not a prime, questions become much more difficult — all of the current methods for proofs rely 


on linear algebra, and thus there are barely any results known. 


We'll close this lecture with a quick riddle (looking ahead to next lecture): 
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Problem 35 
Consider the subset © = {(1, 2"), @) 27), (3, 23),2== (10°, 2!) jt = {GB eZ 1 <7 = 10°) © RR, What is 


a nonzero polynomial P € R[x, y] that vanishes on all points in X, what is the minimum possible degree of P? 


We can get a degree of 1000000 by using P(x, y) = ee —j), and we can get half that degree (500000) by 
splitting up the points into pairs, drawing a line through each pair, and multiplying those lines together. We can then 
go one step further, dividing X into 5-tuples of points, find a quadratic polynomial (conic) through them, and multiply 
those together to get a degree of 400000. The next step is to use a cubic through nine points, and so on — it turns 


out that we can actually get a polynomial of degree less than 1500 to work. But we'll talk about this next lecture! 


8 March 1, 2022 


Last lecture, we discussed the question of finding a polynomial of small degree vanishing at particular points. We'll 


solve that problem today, starting with a useful result: 


Lemma 36 


Let F be a field, and let X C F” be a subset (of n-tuples) of size less than (ee) for some nonnegative integer d. 


Then there exists a nonzero polynomial P C F[x;,--- , x,] of degree deg P < d, such that P(q) = 0 for all gE X. 


As an example, last lecture we looked at the points (j, 2’) for j € {1,--- , 10°}. Then setting n = 2 and d = 1413, 
we get 10° < (*%)°), so we can find a degree 1413 polynomial P(x1, x2) so that P(j, 2/) = 0 for all j € {1,--- , 109}. 


We'll prove this lemma essentially just using linear algebra: 


Proof. Let V be the vector space of all polynomials P € F[x1,--- , X,| of degree at most d (this space does contain 
0 even though we're not ultimately allowed to use it as our polynomial P) — by stars and bars, the dimension of this 


space is the number of monomials in x1,--- ,X, of degree at most d, which is CO). Suppose X = {qi,--- , Qu}, SO 


d+n 
that | M = |X| < ( ¢ ) by assumption. Indeed, consider the linear map V + F” which maps P to its values in X: 


P++ (P(q1), P(q2),--- , P(am))- 


Our goal is to show that the kernel of this map is nontrivial (since we want P(q;) = 0 for all /). But this is true 


because the dimension of V is larger than the dimension of F™ (by the boxed statement above), so there is a nonzero 


polynomial P with the desired property. 


Remark 37. Another way to think about this proof is that requiring P(q) = 0 is a linear constraint in each of the 
coefficients of the monomials, and if we have ‘oe coefficients and |X| linear relations, there must be a nonzero 


choice of coefficients satisfying the relations. 


We'll now explain why this lemma is useful beyond the puzzle that we stated last time — it turns out this result is 
pretty powerful, and we'll now see how it’s applying to the finite field Kakeya and Nikodym problems. But first we'll 


mention some other useful facts: 


Fact 38 


If P € F[x] is a (one-variable) nonzero polynomial over a field F and deg P < d, then there are at most d different 


elements a € F such that P(a) = 0. 
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As a reminder, we basically just factor out the polynomial with the different as. And we're stating this fact with 
deg P < d instead of deg P = d just for convenience, though it’s not really changing the statement because we can 


always just use a smaller d. 


Fact 39 


Let q be a prime power, and let P € Fg[x1,--- , Xn] be a polynomial of degree at most (q— 1) in each variable x;. 


Then if P(x1,-++ , Xn) = 0 for all (4,--+ Xn) € FG, then P must be the zero polynomial. 


In other words, if all exponents of all variables in all monomials of P are smaller than g, then we can’t have P vanish 
everywhere unless P is identically zero. This is one of our homework problems — as a note, we need the restriction on 
degree because a polynomial like x7! — x; vanishes on all n-tuples of FQ by Fermat's little theorem, but it is not the 
zero polynomial. 

We're now ready to discuss more exciting results, starting with the finite field Nikodym problem. The finite field 
Kakeya problem fell first in 2009, with the methods also working for the finite field Nikodym problem, but we'll talk 


about them in the opposite order because the technical details are clearer that way: 


Definition 40 


A set N C F@ is a Nikodym set if for all x € FG, there is an affine line £€ FQ such that x € £ and L\ {xX} ON. 


In other words, for every point x € F4, there is a line (not necessarily through the origin) @ through x such that 


the entire line, except possibly x itself, is in N. So the set N contains a lot of “almost-complete lines.” 


Fact 41 

Over the real numbers, the condition for being a Nikodym set is the same, though to avoid having the set be too 
large we restrict the space of consideration to [0, 1]” C R” — it turns out that there is a Nikodym set of measure 
zero, even though there are so many almost-complete lines in the set. (On the other hand, it’s an open problem 
to show that the Hausdorff dimension of a Nikodym set N C [0, 1]” must still be n, and there are connections of 


this problem to harmonic analysis. ) 


Some examples of Nikodym sets include FG, or Fg without a point, line, or affine hyperplane (choose our lines away 
from that hyperplane if x is in the hyperplane and along the hyperplane otherwise). But this set is still pretty large — 


n-1 


it has size q" —q — and we're curious how much smaller we can make a Nikodym set. It turns out that we do need 


to have “very big” Nikodym sets over finite fields: 


Theorem 42 


Any Nikodym set N C FY has size at least |N| > (ue) > saa". 


In particular, if we fix the dimension n and consider a large g, we must still have a constant fraction of all points in 
Fj in our Nikodym set. (And the factor of 2 in the denominator really doesn’t matter — it comes from the fact that 
we haveaq—1> = in the expansion of the binomial coefficient.) This is in stark contrast to the real case, and it 


was very surprising when the result first came out! 


Proof. Suppose for the sake of contradiction that there were a Nikodym set of size |N| < a) By Lemma 36, 


there is a nonzero polynomial P € F[x1,--- , Xn] of degree at most (q — 2) which vanishes on all of N. 
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We claim that this implies P(x) = 0 for all x € Fg. Indeed, for an arbitrary x € Ig, by the properties of a Nikodym 
set, there is some affine line 2 C Fe with @\ {x} C N. Let v be a nonzero vector in the direction of 2 so that we can 
parameterize £= {x + tv: t © Fg}. We then find that 


xttvel\{x} CN VteF, \ {0}, 


so for all t € Fy \ {0}, we have P(x + tv) = 0. Now because v and x have been fixed, we can think of our relation 
as a one-variable polynomial equation in t, and because P has degree at most (q — 2) and t shows up linearly in the 
argument, P(x+ tv) is a polynomial in t of degree at most (q—2). But now by Fact 38, because P vanishes on (q—1) 
points of the form x+ tv (for t € F, \ {0}), P must indeed be the zero polynomial in t (note: this is not the same as 
saying that P is the zero polynomial, only that for a fixed x and v it’s zero in t), meaning that P(x + Ov) = P(x) =0 
as well. This proves the claim. 

Finally, applying Fact 39, since P(x) = 0 for all x € Fj and P has degree at most (q— 1) in each variable x; (since 
P overall only has degree at most (q — 2)), P must be the zero polynomial, a contradiction. Thus we must have 
|N| > (972°?) as desired. 


We'll finish by introducing the finite field Kakeya problem, again using a finite field Fg for prime power q: 


Definition 43 


A set K C F9 is a Kakeya set if it contains a line in every direction. In other words, for every vector a € FQ \ {0}, 


there is some b € F4, potentially zero, such that {b+ta:teFg} CK. 


Just like before, we're curious how small a Kakeya set may be, and we'll again see that the Kakeya set takes up a 


constant fraction of the total space for fixed n: 


Theorem 44 (Dvir, 2009) 
Any Kakeya set K C F9 has size at least |K| > (re) = 


Fact 45 


In the real setting, a set is a Kakeya set if it contains a unit interval in every direction, and again in IR” we have 


a situation where there are Kakeya sets of measure zero, but it’s an open problem to show that the Hausdorff 


dimension of a Kakeya set must be n. 


We'll discuss the proof next time (we've been shifting gears to using more polynomial methods), but one note is 
that finite fields aren’t necessarily more rigid than the real numbers — there are often situations where the behavior is 
very similar. But for a more recent counterexample where we see this disparity, we'll discuss the cap set problem later 


in the course. 


9 March 3, 2022 


Today, we'll prove the bound for the finite field Kakeya problem that we described last lecture. Recall that a Kakeya 


set in F{ (for some prime power q) is a set that contains a line of the form {b+ ta: t € Fg} for any “direction” a € F9, 


and we are trying to prove that any Kakeya set has size at least Ce) > +q" (which is a constant fraction of the 
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whole set for fixed n). This proof will look pretty similar to the proof of the finite field Nikodym problem, in which we 


combined a few lemmas about polynomials. 


qtn-1 


n*). Then, analogous 


Proof. Suppose for the sake of contradiction that there existed some Kakeya set with |K| < ( 
to last lecture, we can find a nonzero polynomial P € Fg[x1,--+ , Xn] of degree d < q— 1 which vanishes on all of K. 
For technical reasons, we'll need to exclude the case where d = 0, and indeed this is true because nonzero constants 
do not vanish on K, and K is nonempty because it needs to contain a line. So P is a polynomial of degree between 1 
and (q— 1), inclusive. 

Our next step in the Nikodym proof was to prove that P vanishes everywhere, but that isn't going to work directly 
here because the condition looks different. Instead, let’s work with what we have: we're told that for every a € F9\{O}, 
there is a vector b € Fj such that {b+ ta: t © Fg} C K. Thus, P(b+ ta) = 0 for all t € Fg, and if we look at 
this as just a polynomial in t with fixed a, b, we have q roots (all t € Fg) but only degree at most d < q— 1 (again, 
remember that the degree in t can be less than the degree of P itself because of cancellation), so (applying another 
fact from last time) P(b + ta) must be the zero polynomial in t. (Remember that over finite fields, polynomials like 
(x4 — x) mean that vanishing everywhere does not tell us that we have the zero polynomial — the degree constraint is 
indeed necessary. ) 

We can now look at the t? coefficient in P(b+ ta), which we know must be zero for any a. Since P is itself degree 


d, we can only use the leading terms, and we'll write 
P(x1, rd Xn) = Pa(x1, a7: Xn) + Pea-1(%, a Xn) 


where the first term Py is the homogeneous degree d part of P (and Peg_1 is the rest of the terms of P, which all 
have degree at most (d — 1)). Since we assume P actually has degree d, we know that Pg is a nonzero polynomial. 
Plugging in our argument b+ ta into the polynomial P, notice that Peg_; does not contribute any t¢s (because it only 
has degree at most (d — 1)), and furthermore we can’t use any bs in the first term for t? terms because we must use 
the part with t every time. (To be more clear, we can write b+ ta as (b; + tay, bo + tao,--- ,b, + ta,) and imagine 
expanding it out.) 

Thus the contribution to the t? coefficient is just Py(ta) = t7Py(a), and remembering that this is zero, we find 
that Pg(a) = 0 for all nonzero a € FY (remembering that a has to be a line direction). And Pq(0) = 0 as well, because 
Pq is homogeneous of degree d > 1. Thus Py vanishes everywhere and has degree d < q—1, so it definitely has 


degree at most q—1 in each variable and (by our homework lemma) P, is the zero polynomial. This is a contradiction, 
ees | 
n 


proving that our initial assumption of |K| < ( was incorrect. 


Fact 46 
The best known bound for this problem (Bukh—Chao 2021) is that |K| > Pye (so basically we can improve 


our + factor to a). On the other hand, there is a known construction (published in 2008, basically right after 
Dvir's proof) of a Kakeya set K C F% of size srg" +0O,(q"!) (O, meaning asymptotics for n fixed, q large). 


So we basically have matching lower and upper bounds here! 


Fact 47 
On the other hand, results for the finite field Nikodym problem are more sparse, and even the smallest known 
constructions have size (1 — o(1))q” (such as our example from last time deleting a hyperplane from F%). And 


this has been conjectured to be optimal. 
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We'll now turn to the joints problem, another geometrically-motivated combinatorics problem with ideas motivated 
by the Kakeya and Nikodym problems, but living in IR? instead of Fj. We'll start with some easier riddles to motivate 


the statement: 


Problem 48 


If we have a set of L distinct lines in the plane, what is the maximum possible number of points that lie on at 


least two of the lines? 


(The answer is icy because any two lines can only intersect in at most one point, and we can achieve this by 


choosing lines generically. ) 


Problem 49 
If we have a set of L distinct lines in the plane, what is the maximum possible number of points that lie on at 


least three of the lines? 


We can get O(L) points by picking bt points on one of the lines and having pairs of lines pass through each of 
those points, and we can improve that to O(L?) by drawing a grid of horizontal, vertical, and down-right diagonal 
lines (or equivalently drawing an equilateral triangular grid). And we have an upper bound of 3(5) because each of the 


points must be counted in at least three pairs of points from the previous problem. 


Problem 50 
If we have a set of L distinct lines in R°, what is the maximum number of points that lie on at least three of the 


lines? 


This problem isn't more interesting than the previous one — we can still take the same example as above by just 
restricting to a plane, and O(L?) is still possible and still required. But to make the problem more interesting, we can 


try to reduce this degeneracy and not allow the lines to be coplanar. That finally leads us to the problem: 


Definition 51 


Let £ be a set of lines in R°. A joint of £ is a point p € R? which lies on three non-coplanar lines of L. 


It's okay if there are three lines through p that are coplanar, as long as we can find three that are not. And it’s 
important that a joint is a point, not a triple of lines (otherwise we could just have all lines through the origin and the 


problem would not be interesting). And this finally leads us to the joints problem: 


Problem 52 


Let L € N. What is the maximum number of joints of a set £ of L lines in R?? 


This problem was first posed in the 1990s with an upper bound of L’/4, and it wasn’t until 2010 that Guth 


(Professor Larry Guth in our department) and Katz resolved it. We'll start with some constructions: if we take a 


kx kx k grid in {1,--- ,k}8, we need to use L = 3k? lines and we get k? joints, so this allows us to achieve O(L?/2) 
1 

(with a constant of Word 
To get a slightly better construction (in fact best known), we can consider k planes in R? in general position 


(meaning that we don't get extra intersections, for example) and let the lines in £ be the L = C) intersection lines 


between any two planes. We then get (5) joints (from the intersection of any three planes), meaning that we again get 


O(L3/2) but now with an improved constant of ce And it turns out this is the best possible up to constant factors: 
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Theorem 53 (Guth—Katz 2010) 
If £ is a set of L lines in R°, then £ has at most 3L3/2 joints. 


Fact 54 
The constant 3 in the theorem above can be optimized, as we'll see in the proof, but it turns out the true constant 


is actually ee as in our second construction above (as proved by Yu and Zhao in 2019). 


We won't have time to do the whole proof today, but we'll mention the technical details today and do the main 


argument next lecture: 


Lemma 55 


Let £ be a set of lines in R® (of any size), and suppose that £L has J joints. Then some line in £ contains at most 


2_/*/3 joints. 


We'll prove this lemma next time, but for now we can see why it implies Theorem 53: 


Proof of Theorem 53 assuming Lemma 55. Let f(L) be the maximum possible number of joints of L lines; our goal 
is to show that f(L) < 3L3/2. (Notice that f(0) = O and that f(0) < f(1) < f(2) < ---.) We claim that 
f(L) < F(L—1) +2F(L)"; indeed, if L is a set of lines in R? of maximally many (J = f(L)) joints, then one of the 


lines contains at most 2J1/3 = 2f(L)/3 joints. Deleting this gives us at most f(L — 1) joints left, and we could have 


only removed at most 2f(L)!/3 of them, proving the claim. 


From here, repeatedly applying the claim gives us 


F(L) < F(L—1) + 2F(L)/9 < F(L—1) < F(L — 2) + 2F(L — 1) 4 2F (LL) +---, 


finally arriving at (using f(0) = 0) 


FL) Sener to OAL) S| Ear 


so that F(L)7/* < 21 and thus f(L) << 29/7 L9/* < 31°". 
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Last lecture, we set up the joints problem, which aims to find the maximum number of joints (points that lie on three 
non-coplanar lines) formed by a set of L lines. We're aiming to prove Guth and Katz's 2010 result, which proves a 
bound of 3L3/? joints (a better constant can be achieved but we won't discuss it), and we mentioned that the proof 
follows from the main lemma stated last time, namely that if a set of lines £ (no constraints on the number of them) 


has exactly J joints, then one of the lines has at most 2J?/3 joints. 


Proof of main lemma. (If J = 0 this is clear.) Assume for the sake of contradiction that all lines contain more than 
2J*/3 joints. Much like in the proof of the finite field Kakeya problem, we will try to find a polynomial that vanishes 
on all joints. Since we don’t know what the optimal degree is, we choose a nonzero polynomial P € RIx, y, Zz] of 


minimum possible degree, such that P(q) = 0 for all joints q. 
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We claim that we can make deg P < 2J1/3_ Indeed, recall our lemma that given a subset X C F” of size less than 


Ce 


A Vy there is a nonzero polynomial P of degree at most d such that P(q) = 0 for all q € X. So because n = 3 here, 


a) ea) > a > J. So because there are J 


if we plug in d = 2J/3 into the lemma, then we find that ( — 
joints, we do indeed have a polynomial of degree at most d with the vanishing that we want. 

But now P vanishes on more points on each line than deg P, so on each line P must vanish completely (since 
restricted to that line, by parameterization, P is a one-variable polynomial with more roots than its degree and thus 
must be the zero polynomial). And now here’s a new idea that hasn’t come up in proofs before: since P vanishes 
on lines of the form {a+ tb: t € R}, the directional derivative of P in the direction of b vanishes on the entire line 


as well (since P is constant on that line). Thus, 
VP(a)-b=0 if{a+tb:teER}eE. 


But now we can use the fact that whenever a is a joint, we have three different bs which satisfy this equality 
(corresponding to the three lines through a) Furthermore, these three bs can be chosen to span all of R? (by definition). 
So VP(a) can be dotted with three different non-coplanar vectors, resulting in 0 in all cases, and thus VP(a) = 0 for 
any joint a of ZL. 

However, VP is a vector of three entries oe ap ae and we've just found that each of these polynomials vanishes 
on all of the joints. But those polynomials are of lower degree than P, and we initially chose P to be of minimum 
degree. Thus, the only way this could be possible is if each of those polynomials is the zero polynomial, but that would 
require P to only be a function of y and z, and also only a function of x and z, and also only a function of x and y, 


which only happens if P is constant and that’s a contradiction when J > 0. (We do need all three polynomials here 


to get our contradiction!) Thus there must be some line which contains at most 2J1/? points. 


This concludes the part of the class discussing polynomial methods relying on the number of roots a polynomial 
can have based on Its degree. We'll now spend a bit of time discussing spectral methods — we'll still be using linear 
algebra, but instead of just making arguments about linear independence and so on, we'll be considering properties of 


matrices like eigenvalues. 


Definition 56 


Let G be a graph with vertices {1,--- ,n}. The adjacency matrix of G is the n x n matrix A = (aj), where aj 


is 1 if / and J are adjacent and O otherwise. 


While this matrix has a natural definition which doesn't seem to give us much illuminating information about G, 
it's surprising how much we can actually gain from looking at the eigenvalues of this matrix. But first, we'll make 
some observations: A has all entries 0 or 1 and with zeros on the diagonal, so tr(A) = 0 (and thus the sum of the 
eigenvalues is also zero). And because we have a real symmetric matrix, the spectral theorem tells us that A has 
all real eigenvalues and that we can find an orthogonal basis of eigenvectors spanning IR” (we do not even need to 


consider generalized eigenspaces or Jordan normal form). 


Lemma 57 


Let G be a d-regular graph (so deg(v) = d for all vertices v). Then d is an eigenvalue of the adjacency matrix A 


of G, and all other eigenvalues A of A satisfy |A| < d. 


Proof. Since G is d-regular, the adjacency matrix has exactly d 1s in each row. Thus, the all-ones vector will be 


an eigenvector of A with eigenvalue d (since dotting the all-ones vector with any row gives us d). For the other 
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eigenvalues, suppose A is an eigenvalue of A, and let v = (wW,--- Vn (a column vector) be a nonzero eigenvector 
of this eigenvalue A. Let j be an entry such that |v;| is maximal, so that |v;| < |v;| for all / and |vj| > 0 (because v is 


nonzero). If we then look at the jth coordinate of Av = Xv, then we get 
Ay = Dov 
inj 
where / ~ J means that / is a neighbor of / in the graph G. But now taking absolute values and using the triangle 


inequality, we have 


Allyl =]o vil < So lui < ly 


inj inj 


because j is adjacent to exactly d other vertices. Thus because |v;| > 0, dividing by it yields |A| < d, as desired. 


(In particular, the eigenvalue —d shows up if and only if the graph is bipartite, which we'll show on homework.) 
And we'll now see with this next result that we can do matrix calculations with the adjacency matrix, explaining why 


this is a good way to encode the information about the edges of G: 


Lemma 58 


Let G be a graph with adjacency matrix A. Then the (/,/) entry of A” is the number of walks of length m from 


i to J (paths that are allowed to repeat vertices and edges). 


(For example, a walk 1 > 17 > 5 > 1 > 3 is of length 4, so the (1,3) entry of A‘ will get a contribution from 


this walk if all adjacent vertices in that walk are edges of G.) This is basically an exercise in thinking about how matrix 
multiplication works with adjacency matrices, and it'll be on our homework as well. 
We'll now look at a graph theory result, the friendship theorem, which was originally proved without spectral 


methods but where the proof using adjacency matrices turns out to be simpler. 


Theorem 59 (Friendship theorem) 


Let G be a graph such that any two distinct vertices have exactly one common neighbor. Then the only valid 


graphs G are sets of triangles all joined together at a common vertex (we can check that any two vertices here 


indeed share a neighbor), which we call “windmill graphs.” 


For example, we can imagine that the graph encodes the friendships between a group of people, and any two people 
are exactly one common friend. Then in particular, this result tells us that there is one person who is friends with 
everyone else, and then we have a perfect matching of the remaining people. (This result was originally proved by 
Erdds, Rényi, and Sds in 1966.) 


Start of proof. Suppose there is a vertex connected to all other vertices. Then applying the theorem condition to the 
central vertex and any other vertex, we find that that other vertex must have degree 1 among the other remaining 
vertices. Repeating this logic, the graph induced by the remaining vertices has every degree 1, so it must be a perfect 
matching and that gives us the windmill graph situation. So the theorem is satisfied in this case. 

So now we can assume that there is a vertex not connected to all other vertices. We wish to show that the graph 
must be regular in this case, and we'll start by getting halfway there. We claim that for any vertices x, y that are 
not adjacent in G, we have deg(x) = deg(y). Indeed, let the neighbors of x be v1,--+ , Vm. Applying the theorem 
condition to vy and y, we get a common neighbor v/ of yj and y which is not x. Furthermore, vj through vs, are all 


distinct vertices (if v/ and vj were both the same, then v; and vj would share both x and v/ = vj as neighbors, which 
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is not allowed). Thus y has at least as many neighbors as x. Flipping the logic shows the result the other way around 


as well, so deg(x) = deg(y). 


We'll continue this proof next time! 


11 March 10, 2022 


Last time, we started discussing the friendship theorem, which we're trying to prove using spectral graph theory. Recall 
that the setup is as follows: suppose G is a graph such that any two vertices have exactly one common neighbor. We 
wish to prove that G must be a “windmill graph,” where one vertex is connected to all other vertices and the remaining 
vertices induce a perfect matching. What we've already shown last time is that if G has a vertex that is adjacent to 
all other vertices, then G is indeed a windmill graph. Furthermore, we've considered the other case (where any vertex 
of G is not adjacent to all other vertices), and we've shown that for any vertices x, y that are not adjacent, then 
deg(x) = deg(y) (by showing that we can associate any neighbor of x with a different neighbor of y, and vice versa). 


This is nice because spectral graph theory is particularly nice for regular graphs, and now we'll continue the proof: 


Continuation of the proof of the friendship theorem. We now know that if no vertex of G is adjacent to all other 
vertices, then G has many equal degrees among its vertices. We'll now take this a step further and claim that G is in 
fact regular. Indeed, suppose we have two vertices x, y that are adjacent in our graph. If we could show that there 
is some vertex z that is not adjacent to either x or y, then the previous claim would give us deg(x) = deg(z) and 
deg(y) = deg(z), so that deg(x) = deg(y). So the only remaining case is the one where every vertex is adjacent to 
either x or y. 

We're now in the case where no vertex of G is adjacent to all other vertices, so in particular, there is some vertex 
x’ which is not adjacent to x and some vertex y’ which is not adjacent to y. Notice that x’ 4 y and y’ # x because 
we chose x and y to be adjacent, and in fact x’ is adjacent to y (since it’s not adjacent to x but needs to be adjacent 
to either x or y) and y’ is adjacent to x. Furthermore, x’ and y’ are not the same vertex because they have different 
relations to x and y. 

Thus, we have a picture of y’, x, y, x’ connected in a path in that order in G, and using the friendship theorem 
condition, x’ and y’ have a common neighbor z which is none of x, y, x’, y’.. This means we have formed a pentagon 
with the five vertices x, y,x’,z, y’ in that order, but because z is not x or y it must be connected to one of those 
vertices by assumption. Without loss of generality, assume z is adjacent to x. And this is a contradiction, because 
now z and y have two common neighbors (x and x’) and they should only have one. Thus in this case we must have 


a vertex z adjacent to neither x nor y and we indeed have deg(x) = deg(z) = deg(y). 


Zz 


x kg 


Bringing things back now, we now know that if G doesn’t have a “central vertex” connected to everything else, 
then G is regular because all degrees are equal (call this common degree d). If d = 0 the theorem statement is true 
(we must have the one-vertex graph), and if d = 1 we have a perfect matching and G cannot exist. So d > 2, and 
from here we will actually finish the proof pretty quickly using spectral graph theory. Suppose we have n vertices, and 


let A be the nx n adjacency matrix of G. Then A's (i, /) entry is the number of walks from / to / in exactly two steps, 
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which is d for each diagonal entry (/, /) (walk to a neighbor of / and back) and 1 for each off-diagonal entry (/,/) (use 
the only common neighbor of / and /). (Here is where it’s important that we're talking about walks, not paths!) 
Thus, A? is of the form 


d 1 if 9) 1 7 1 
it | O 2 de 1 4 1 
al. f=(d-l]. . [+ 
ig 1 aed d 0 0 jae 1 1 1 and 1. 


(notice that it's nice that even though we barely know any of the entries of A, we know all of A? explicitly), and now 
we can compute the eigenvalues of A*: the entire (n — 1)-dimensional subspace x, +--+: +X, = 0 is in the kernel of 
the all-ones matrix, so A? has an eigenvalue of (d — 1) with multiplicity (1-1). Then the final eigenvalue is n+ d—1 


(coming from the all-ones vector; we could also have found this by computing the trace of A?). 


We thus know the eigenvalues of A up to a sign: they're the (+) square roots of the eigenvalues of A*. But 
remember that for any d-regular graph (which A is), one of the eigenvalues of A must be d. And /d—1 < d for 
d > 2, so the only possibility is /n + d— 1 =d. It turns out the specific value of n doesn't matter: instead, we find 
that the eigenvalues of A are d = \/n +d —1 and (n— 1) eigenvalues that are each one of +\/d — 1. Furthermore, 


the trace of A is zero, so we have 


d+aV/d—1—-BVd—1=0 


for some a+ 6 = (n—1). But letting 6 -a =k, we find that 
d=k/d—1 = K(d-1)=d, 


which implies that (d —1)|d? = > (d—1)|d? —(d—1)(d+1) = 1, which can only occur if d = 2 (since we assumed 


d > 2). So we have a 2-regular graph of 3 vertices, and that is in fact a situation where G has a vertex adjacent to 


all other vertices (this is the single-triangle windmill graph). (Alternatively, if we didn’t know the value of n, we could 


note that a 2-regular graph is a collection of cycles which needs to be connected.) This concludes the proof — even in 


this case, the only valid G is a windmill graph. 


We'll now move on to a powerful result, which concerns eigenvalue expansion properties of graphs (in other 
words, instead of employing spectral graph theory to prove a result, we base our result off of spectral graph theory 
from the start). Intuitively, a graph is an expander if it is very well-connected, meaning that between any subset of 
the vertices and its complement, there are a lot of edges (so we don’t have a situation where there's one half of the 
graph connected to the rest by only a single edge). We won't go into the details of those definitions precisely, but we 
will talk about how we can understand expansion properties from eigenvalues. 

Here's the setup: let G be a d-regular graph on rn vertices, let A be its adjacency matrix, and let Ay >--- > An 
be the eigenvalues (with multiplicity) of A. The spectral theorem gives us an orthonormal basis of IR” consisting of 
eigenvectors V1,--* ,V_, (meaning that Av; = ,; for all /). We know that Az = d and Ap > --- > A, > —d: we will use 
the notation that vz is always the normalized all-ones vector (with Ti in each entry), even when d is an eigenvalue of 


higher multiplicity. 
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Definition 60 
For a d-regular graph G and subsets of vertices B,C C V(G) (not necessarily disjoint), define the number of 


edges between B and C to be 


e(B,C) = |{(u,w) € Bx C: u, w adjacent}| . 


(In particular, edges contained within both B and C are counted twice, and if C = B® then we do have the ordinary 
number of edges between B and its complement.) We can in fact compute this quantity using the adjacency matrix, 


because 
e(B,C) =1pAlc 


for the characteristic vectors 1g,1¢ of B and C (we get a 1 contribution whenever there’s a 1 in some entry of B, a 


1 in some entry of C, and then a 1 in the corresponding spot in A that connects those vertices in B and C). 


Lemma 61 


Let G be a d-regular graph on n vertices. Then for any subset B C V(G), we have 


e(B,V(G)\ B) > (d — 2) BP 16). 


In other words, the number of edges required between B and the rest of the graph can be characterized by the 
spectral gap between the first and second eigenvalues of the adjacency matrix of G (this inequality is useful whenever 


2 < d and we don’t have a multiple eigenvalue at d). To prove this, we'll make use of the following elementary fact: 


Fact 62 


From linear algebra, we know that because v1,--- ,V, form an orthonormal basis, any w € R” can be written as 


W = avy +-++ + anVp for aj = w- vj, So that ||w||? = af +--- + a2. 


Proof of lemma. Let B denote the complement V(G)\ B of B for convenience, and let 1g and 1g be the characteristic 


vectors of B and B, respectively, Notice that 1p + 1g is the all-ones vector. Then 


e(B, B) = 1ZAlg = 1a(Alg). 


Using the fact above, we can write our characteristic vector in terms of the orthonormal basis of eigenvectors 


lp = byvy + +++ + bnVp |, 


where in particular b} = lg- vy = !5l (because v, is -& times the all-ones vector). Additionally, because lz is the 
vn vn B 


all-ones vector minus 1g, which is \/nv; — 1g, we have 


B 
s= (va mak boVo — +++ — DpVp. 


Now the action of A on 1g is simple to write down, because each term is an eigenvector of A: 


n—|Bl 
Vn 


Alg=A1 Vy A2beVe — +++ — AnbaVp . 
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Dotting this with 1g and using orthonormality of the vjs now gives us 


= |B] (n—|B)) 
e(B, B) = 1ZAlg = ATR Th dabs — AgbZ — +++ — Anbe. 
But because Ao > A3 > --- > A, and A, = d, we can upper bound this by 
B\(n-|B 
>a! ae II) _ (08 +--+ 82), 


|B? _ |B\(n-|B)) 
n n 


and now using that b3 + --- + b? = ||1,||? — bz = |B , we can simplify this as 


IGS IE. 5. Weie= 16) 


e(B, B) = ; ; 


= (d — 2) 


|B\(n — |B)) 


as desired. 
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Last lecture, we discussed expansion properties that we can deduce from the eigenvalues of the adjacency matrix of a 
graph. Specifically, we know that the largest eigenvalue of a d-regular graph’s adjacency matrix is d (corresponding 
to a normalized all-ones vector), and then we can count the number of edges between a subset of the vertices and the 


rest of them in terms of the spectral gap d — Ao: 


e(B,V(G)\ B) > (d rx) M2 

We may ask whether there's a corresponding upper bound in terms of the spectral gap, and we may recall that we 
bounded a bunch of eigenvalues by Az,--- , An < Az (this was the only inequality we had in the proof). So we can 
do the same thing but replacing Az with A, to get an upper bound, but that’s not a very good bound because A, Is 
typically far and we already have the trivial bound e(B, V(G) \ B) < d- min(|B|, n— |B]). 

But now we'll generalize and ask about e(B,C) more generally (recall that this is the number of ordered pairs 
(u,w) € Bx C with u and w adjacent), and an upper bound might be more interesting in this case. If we imagine a 
d-regular bipartite graph and we set B and C to be subsets both within the same part, then e(B, C) = 0; furthermore, 
the graph is bipartite if and only if A, = —d (we will show this on our homework). In other words, often we want 
to control both the distance between 2 and d and the dsitance between ,, and —d if we want to get upper 


bounds. 


Lemma 63 
Let G be a d-regular graph on n vertices with adjacency matrix eigenvalues A, > --- > Ap. Suppose there is 


some A > 0 such that |A;| < A for all 2 < i < n (equivalently, A2 < A and A, > —A). Then for any subsets 
B,C CV(G), we have 


d 
e(B,C) — —|BIIC|| < Av |BIIC]. 


The idea here is that the density of the graph is around d (if we pick two vertices independently at random, the 
probability that they are adjacent is 4). So we should expect that among the pairs (u, w) € B x C, about 2|BI|C| of 
them exist on average. And this result tells us that we are indeed close to this mean, controlled by A. (But this is only 
really useful when A is significantly smaller than d — for example, in the case where we have the d-regular bipartite 


graph and e(B, C) = 0, the inequality reduces to |B|- |C| < n?, which is useless.) 
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Proof. As usual, let 1g,1¢ € R” be the characteristic (indicator) vectors of B and C, respectively. We again have 


e(B,C) = 1BAlc, and again we'll express the indicator random variables in terms of the orthonormal eigenvector 


basis: 
1g = diy +--+ + bya, ee eel 
Jn 
(this is repeating the same calculation as last lecture — remember that b? + --- + 62 = ||1e||? =|B]) and similarly 
fe= ore C= 1 
Jn 


and cf +--- +c? =|C|. Then plugging into the formula for e(B, C), 
e(B,C) = 1BAlc = 1g-(Alc) = (bry +--+ + bpVp) + (Arciva +++ + AnCnVn) 
and now by orthonormality this simplifies to 


= AybyC, + AvboCo +--+ + Anbncy. 


Bl. Icl 
vn vn 
Subtracting off the first term from the rest, we now have 


The first term here is d- = {|B\|CI, which is a good sign, and now we want to bound the remaining terms. 


d 
e(B,.C)= 7 IBIIC| = Agbeca +--+ + AnbnCn, 


so that 


d 
e(B, C) — <ellc| < A(|b2||Co| +--+ + [bnllcal) 
(this is where we use the assumption that |A;| < % for all /). And now by Cauchy-Schwarz, this simplifies to 
< Abt + ba +- +2 < AVIBIVIC!, 


which is precisely the result we want. 


— ICP 


n 


2 . 1 
= ef in the square roots — if 


(Notice that in the last inequality we've ignored the effect of the b? and i 
we kept those in our consideration, we'd end up with a very similar to result to the one we proved last lecture.) So 
the moral of the story here is that having good information about eigenvalues can tell us interesting information about 
edges in a graph. 

We're now going to see how these kinds of methods can be applied to a recently proved and celebrated result, the 


sensitivity conjecture. But we'll have to do some more preparation for that first: 


Theorem 64 (Min-max principle) 
Let A be a real symmetric n x n matrix (for example, an adjacency matrix), and let Az > Ao > --- > Ay be its 


eigenvalues. Then for any 1 < / <n, we can write the jth eigenvalue in two different ways: 


Aj = max | min (Ax)-x] = — min 
jujcR” \ xeuU — UCR" 
dimU=j \I[x||=1 dim U=n—j+1 


In other words, we either find the s-dimensional subspace for which we can always guarantee the best lower bound 
for (Ax) - x, or we find the (n — j + 1)-dimensional subspace for which we can find the best universal upper bound for 
(Ax) - x. 
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Proof. Let v1,--+ ,Vp_ be an orthonormal basis of eigenvectors of A with corresponding eigenvalues Ay,--- , An. Any 


vector x can be represented as cyvy +--+ + CyVp, and 
(Ax) «x = (Areiva e+ ++ AnCnVn)(C1V1 ++ + CaVn) = Are? H+ + Anc?. 


Since x is always of norm 1 in the statement of the min-max principle, we must have c? + --- +c? = 1. So we can 
think of each choice of x as a weighing factor between the Aj;s, and (Ax) - x is always just a weighted average of the 
Ajs (in particular it is always between A; and A,). 


We'll show that both the max-min and min-max expressions are equal to A; by showing both directions of inequalities: 


* For the max-min term, to prove the < direction, we can consider the subspace U = span(w,--- , vj). Then we 
can only make use of the coefficients c, through cj, so (Ax) - x (for unit vector x € U) is always a weighted 
average between Aj,--- ,A;, and thus the minimum value is A;. This means we've found a subspace U where 


the parenthetical term on the right-hand side is A;, so the maximum is indeed at least Xj. 


Similarly, we can prove the > direction for the min-max term by picking U = span(yj,--- , Vn). 


* It might seem like the other direction is more difficult because we have to look at general subspaces U, but it 
turns out to be not too bad. Suppose for the sake of contradiction that the min-max term were strictly smaller 


than A;, so that there is some subspace U* C R” of dimension n— j + 1 with max xeus (Ax) x < Aj — in 
\|X||=1 


other words, | (Ax) - x < Aj; | for all unit-norm x € U*. Then if we consider U = span(w,--- , vj), a j-dimensional 


space, we know that U and U* have a common vector because the sum of their dimensions is n+ 1. But we 


proved (in the first bullet point) that | (Ax) - x > A, | for all unit-norm x € U, so we arrive at a contradiction and 


thus we must have equality. 


A similar argument works for proving the equality in the max-min term by intersecting a problem subspace with 


span(Vj,--* , Vn). 


We'll now mention another result that we'll make use of towards the sensitivity conjecture: 


Theorem 65 (Cauchy interlace theorem) 


Let A be a real symmetric n x n matrix, and let A; >--- > A, be its eigenvalues. Let B be an (n— 1) x (n—1) 


matrix obtained by deleting the Ath row and kth column of A (which is notably still symmetric); call its eigenvalues 


[ly > +++ > Upn—1. Then the eigenvalues interlace: 


Ae Aa ia ee AG 


(This may remind us of Rolle’s theorem from calculus, which tells us that the roots of a polynomial f and its 


derivative f’ interlace. But the two results aren’t really related.) Our proof will make use of the min-max principle: 


Proof. Without loss of generality, we'll assume that we delete the nth row and nth column. (We can do this because 
the eigenvalues of A do not change if we apply the same permutation to the rows and columns of A — the eigenvectors 
get similarly permuted but the eigenvalues stay the same.) If we delete the nth row and the nth column, it’s now 
natural to embed R”~? into the first (n — 1) coordinates of R": for any y € R”~?, consider x = (y, 0). We know that 
\|x|| = ||y|| and also that 

(By): y = (Ax) -x, 
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because B is the top left (7 — 1) x (n— 1) submatrix of A and the additional entries of A don't do anything if the nth 
entry of x is zero. So now by Theorem 64 on the matrix B and then embedding R’~! into R", for any 1 <j<n—1, 


fj|= max | min (By)-y] = max min (Ax) - x 
C=} ucr1 \ yeu UCR" x{o}OR" | xEU 
dimu=j \[ly|[=1 dim U=j |bx||=1 


(in other words, we thought of everything with an appended 0). But this is like the max-min term for A; of the matrix 


A, except with a smaller set of possibilities, so we can bound this as 


< max {| min (Ax)-x] =|A;]. 
ucR" | xeu 
dimU=j \[Ix]=1 


Finally, we also have (again by Theorem 64 on the matrix B) that 


jy |= min max (By)-y] = min max (Ax): x ] , 
UCR"1 yeu UCR"! x{o}GR” | xeU 
dim U=(n—1)—j+1 \llyll=1 dim U=n—(j+1)+1 \IIXll=1 


and again we're minimizing over only a subset of the subspaces in the min-max principle, so we can do better by 


minimizing over all subspaces: 


> min max (Ax) -x }] =| Aj+1 |, 
= UCR" yer ae ) Jt 
dim U=n—(j+1)+1 \IxI|=1 


and chaining together all of these inequalities completing the proof. 


We'll state and prove the sensitivity conjecture next lecture, and the idea is that we'll want to repeatedly use the 


Cauchy interlacing theorem. So here’s what we arrive at when we do that: 


Corollary 66 
Let A be a real symmetric n x n matrix, and let / C {1,--- ,n} be a subset of size |/| = m. Let B be the 
symmetric m x m submatrix obtained by only picking the rows and columns indexed within /. If Ay > +--+ > An 


are the eigenvalues of A, and 41 > --- > Um are the eigenvalues of B, then for any 1 <j < m, 


dj ez Hj 2 Ajtn—m- 


In other words, we delete (n— 1m) rows and columns from A, and each time the second inequality has index increased 
by one. (So we don't get the strictly interlacing pattern that we do in the Cauchy interlacing result, but we can still 


bound the eigenvalues of B in terms of the eigenvalues of A.) nn 


13. March 17, 2022 


We discussed Cauchy's interlace theorem last time, which explains that the eigenvalues of an (n—1) x (n—1) principal 
submatrix of a real symmetric n x n matrix interlace the eigenvalues of the original matrix. This result can then 
be extended to arbitrary-size m x m submatrices — we find that if the original eigenvalues are Ay > --- > Ap, and 
the submatrix eigenvalues are 41 > --- > Um, then (we don’t have strict interlacing in the same way anymore, but) 
Aj > by = Aj+n—m for all 1 <j < m. 

Today, we'll be using this to prove a more exciting result, the sensitivity conjecture (proved very recently, in 2019, 


by Hao Huang). This conjecture was posed in the 1990s, discussing sensitivity of Boolean functions, and it is important 
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for applications in computer science, but we won't talk too much about the history of the problem here because of 
time constraints. (The introduction of Huang’s paper does a good job giving background here if we're curious.) It 


turns out that the problem is equivalent to a problem about induced subgraphs of a hypercube: 


Definition 67 


Let Q” be the n-dimensional hypercube graph, which has vertex set {0, 1}” and has two vertices connected if they 


differ in exactly one coordinate (that is, their Hamming distance is 1). 


For example, Q? looks like a square, and Q? looks like the wire outline of a cube. 


Theorem 68 (Huang, 2019) 
For any positive integer n, any (2”~!+1)-vertex induced subgraph of the n-dimensional hypercube Q” has maximum 
degree at least \/n. 


In other words, if we pick slightly more than half of the vertices of the hypercube, at least one of the vertices 
in that set would be connected to at least ,/n of the others in that set. And we need to take more than half the 
vertices, because if we only take 2”~! vertices, we can take all vertices with even sum of coordinates (in other words, 
“checkerboard 2-color” the hypercube graph), and then all vertices in the induced subgraph have degree 0. So somehow 


adding one more vertex makes a significant difference to this picture! 


Fact 69 
Many people had tried proving this result because of its importance, but the previously known best bound was 
($ — 0(1)) logy n (proved in 1989). And in fact, there are constructions where the maximum degree is [/n] (we'll 


see this on our homework), so this bound is exactly tight. 


This proof uses spectral graph theory, but it uses a trick — we have to take a signed adjacency matrix in which 


some of the edges correspond to 1s and others correspond to —1s. 


Lemma 70 
Let G be an m-vertex graph, and let A be a symmetric m x m matrix with entries in {—1,0,1} and rows and 


columns indexed by vertices V(G). Suppose that A,,, € {—1, 1} if u and v are adjacent, and A,,, = 0 otherwise 


(including the diagonals). If A;(A) > --- > Am(A) are the eigenvalues of A, then 1(A) is at most the maximum 


degree of G. 


In other words, we take the adjacency matrix of G, and we place signs on the entries in any way as long as A 
stays symmetric. And the proof of this result is very similar to when we proved that a d-regular graph has a maximum 


eigenvalue of d: 


Proof of lemma. Suppose we have an eigenvector x = (Xv)vev(a) € RY(©) (coordinates also indexed by vertices) of 
our matrix A with eigenvalue ,(A). Pick a vertex u € V(G) such that |x,| is maximal — we know that |x,| > 0 
(because otherwise x would be the zero vector). Then the uth coordinate of the eigenvalue equation Ax = A1(A)x 


reads 


S- Au,vXv = A(A) Xu, 


veV(G) 
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and now the left-hand side is just a sum over the vertices v adjacent to u (because otherwise A,y = 0). Taking 
absolute values of both sides, we find 


y AuvXv 


v~u 


= |A1(A)|bul, 


and now by the triangle inequality and the fact that A, y = +1 for all v ~ u, we find 


|Av(A)llxul < So bev] S deg(u)/xul. 


v~u 


Dividing through by |x,| gives us |A,(A)| < deg(u), which is at most the maximum degree of G. Thus (A) is indeed 


at most the max degree of G as well. 


If we look at this proof, we can see that the result would also hold if we allowed A,» € [—1, 1] for all u ~~ v, but we 
only need the setting of the lemma for our signed adjacency matrices. And we'll now see how the insight for choosing 


the correct signed adjacency matrix in Huang’s proof: 


Proof of Theorem 68. Let A, be the 2” x 2” matrix defined recursively via the block recurrence 


Qo 1 An— lon 
A, = fee n-1 Dat 
1 0 lon —An—1 


where I5n-1 is the 27-1 x 2-1 identity matrix (with 1s on the diagonal and Os off the diagonal). We can check that A, 


is a real symmetric matrix by induction, because the two /, blocks flip to each other and +A,_1 are both symmetric. 

We claim that A, is a signed adjacency matrix for Q” — in other words, A, has entries in {—1,0,1}, and if 
we replace all —1s with 1s, we get the adjacency matrix of Q” (for some ordering of the vertices). The first part 
(entries all in {—1, 0, 1} is clear by induction, and the second part follows by the following argument: order the vertices 


lexicographically (in other words, order them like the natural numbers when we encode them as binary strings), and 
0 1 
let Af be the adjacency matrix of Q” with that ordering. Then Aj = : ; = A, (because Q? is just a line segment 


connecting two vertices), and furthermore we have the same recurrence without the negative sign: 


hana) 
lon AT, 
This is because the top left block comes from just looking at the “lower level” of the hypercube (all vertices with first 
coordinate 0), which has adjacency given by the (n — 1)-hypercube graph Q”~!. Similarly, the bottom right block 
comes from the “bottom level” of the hypercube (with first coordinate 1). And from there, the edges between the top 
and bottom level can only occur if we take the corresponding vertices on the top and bottom, since we can only have 
one coordinate of difference. So if we forget all of the signs in Aj, the signs in the recurrence relation don’t matter 
anymore and we recover A*. 
So now if we want to apply Lemma 70, we need the eigenvalues of the matrix A,. But that turns out to not be 
too bad: 


Lemma 71 


For all n, we have A? = nlon. 


2 
0 1 1 0 
Proof of lemma. We proceed again by induction: for n = 1, we have : ; = | = Ip = In. Now for the 
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inductive step, notice that by block multiplication, 


A2 _ An—1 Ion An—1 lon _ Ae + les An—1 _ An—1 
‘ Iona —An-1 Iona —An-1 An—1 — An-1 on a AG 4 


but now this simplifies nicely to 


he — 1) lon + Ion-1 0 
= = nln, 


0 (n = 1)lon1 + Iona 


as desired. 


This means that A has all eigenvalues n, so A, must have all eigenvalues +,/n — specifically, because the trace of 
An is zero (all diagonal entries are zero, or because the trace is tr(Ap_1) — tr(A,_1) = 0 from the recursive formula), 
half of the eigenvalues of A, are \/n, and the other half are —./n. So if we tried applying Lemma 70 right now to Ap, 
we'd find that the maximum degree of Q” is at least /n, which isn’t helpful because the maximum degree is n. 

But now if we take any subset H C {0,1}” of size 2”-1 4 1 of the vertices of our hypercube Q”, we can let 
G = Q"|H] be the induced subgraph of Q” induced by H. Remembering that A, is a signed adjacency matrix for Q” 
if we label the rows and columns in lexicographic order, we know that (Ay), is in {1,1} if u and v are adjacent 
and 0 otherwise, so the (2"-1 + 1) x (2”-! + 1) submatrix B obtained by taking the rows and columns indexed 
by elements in H is a signed adjacency matrix for G, and it's also symmetric. So by the Cauchy interlace theorem, 
the top eigenvalue 4; of B satisfies 


Ma > Arz2-—(2n-141) = Age = VN, 


so by Lemma 70 the maximum degree of B is at least \/n, as desired. 


We can in fact see in the proof that if we only took a 2”~1-vertex induced subgraph, the entire proof breaks down 
because the (2”~! + 1)th eigenvalue of A, flips over to —\/n and we are no longer able to say anything useful! So it's 
pretty magical that everything works out so nicely (and in fact that the bound is sharp) — the key was finding a signed 
matrix A, that manages to have a large 2”~!th eigenvalue. 

This is all we'll say about spectral graph theory for now — our next topic will be the combinatorial nullstellensatz 
(a return to polynomials), which is not a particularly difficult statement on its own but turns out to be very useful 
in applications. We won't state the result until next time, instead just stating a lemma that will be helpful (which is 


actually sort of morally equivalent to the combinatorial nullstellensatz): 


Lemma 72 
Let F be a field, and let P € F[x;,--- , x,] be an n-variable polynomial such that the degree of P in x; is at most 
t; for each 1 <i <n. Also let S; C F be a subset of size |S;| = t; + 1 for each /. Then if P(s1,--- ,5,) = 0 for 


all (S1,--+ ,S,) € S1 x --- x S,, then P is the zero polynomial. 


This is essentially a generalization of the result that a one-variable polynomial of degree t with at least (t+ 1) roots 
is the zero polynomial, stated with more variables. In fact, we proved a special case of this for the finite field Kakeya 
problem, with the field being Fg_1, all t; = q — 1, and all sets equal to Fg. When we did the proof on our homework, 
we did a more indirect proof, constructing a map between polynomials and functions, showing that it’s surjective and 
thus bijective by dimension counting, and deducing injectivity from that. We'll present a more direct method of proof 


this time: 
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Proof. As mentioned, the n = 1 case is showing that a nonzero one-variable polynomial of degree t has at most t 


zeros, which is true. For larger n, we use induction: consider P(x,,--- , Xp) aS a polynomial in x,, and write it as 
t tr—-1 
P(X, - ++ Xn) = Qe, (Xt. 00° Xn) Xp” + Qe 1 (Xt Xe) KH™ HB + Qo(m, ++ X12). 


Then for any fixed (s1,--- , S,-1), we have a degree-t, polynomial P(s,,--+ ,S,-1,Xp) in the variable t,, which has 


(t, +1) roots and thus must evaluate to zero for all x,, meaning that all coefficients are zero: 
Q;.(51474* Sao1) = 9+ = QolSi,+** sSaea) =O -VSin** 44 Shea) 6 Sy B+ KX Sg. 


But each of these polynomials Q; has degree at most t; + 1 in each variable x;, so by inductive hypothesis, each Q; is 


actually the zero polynomial, and thus plugging this back in gives us P = 0, completing the proof. 
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Last lecture (before break), we stated and proved a lemma which we'll now be able to use to (relatively easily) prove 
the combinatorial nullstellensatz. That lemma essentially stated that if a polynomial P € F[x,,--- , x,] has degree at 
most t; in each x;, and we have sets S; of size t; + 1 such that P(s,,--- ,5,) = for all 5; € S;, then P must be the 


zero polynomial. The next result will look pretty similar but slightly strange: 


Theorem 73 (Combinatorial Nullstellensatz (Alon)) 


Let F be a field, and let P € F[x;,--- , x,] be an n-variable polynomial of degree d. Suppose that some monomial 


ode --+ xin is a monomial of maximum degree t; + --: + t, = d appearing in P with a nonzero coefficient. Let 


Si,:+-,S, © F such that |s;| > t) for all /. Then there is some (51,---,5)) € Si X +++ X Sp such that 
P(S1,°+: , Sn) #0. 


(The word “nullstellensatz” means “theorem about zeros of a polynomial,” though we should notice that this result 
is telling us that not all of the points in S; x --- x S, are zeros. And in algebraic geometry, there's a deeper theorem 
called Hilbert’s Nullstellensatz, and after the proof we'll explain how we can rewrite this theorem in a similar way to 
that one.) 


Proof. Without loss of generality we can assume that |S;| = t;+1 for all / (because decreasing the size of the sets only 
makes the theorem harder to prove). Throughout this proof, we'll fix the field F, as well as the exponents t),--- , th 
and sets Sj,--- ,S,, but we'll be modifying the polynomial P until we can apply the lemma from last time. (Right 
now, there might be other monomials of degree d, so it is not necessarily true that the degree of P in the variable x; 
is at most tj.) 

The idea is to subtract polynomials that vanish on all of S1 x --- x S, from P, because making those modifications 


doesn't change the evaluation P(s1,--- ,5,). Some simple polynomial “building blocks” that do the job are 


Gi(xis-+* Xa) = [] Os — 5) 
s€Sj 
for each /. This indeed vanishes on all of S; x --- x S, because one of the terms in the product is zero, and g; has 
degree t; + 1. We'll now subtract multiples of these gjs from P, and the idea is that whenever we have a monomial 
with some x; having degree at least t; + 1, we can subtract off an appropriate multiple of g; to get rid of it, and then 


repeat this process until all of the offenders are gone. 


38 


But it’s generally a bit annoying to prove that some process terminates, and the easiest way is to consider some 
invariant property that decreases when we repeat our subtraction process. And whenever we have a proof of that sort, 
the following strategy is usually best to make the proof short. Suppose we have a counterexample P € F[x1,--- , Xa] 
which contradicts our theorem, meaning that (remembering t,,--- , t, are all fixed) rok +++ xfn appears with a nonzero 
coefficient but P(s1,--- ,5,) = 0 for all s; € S;. Choose the counterexample to be minimal using the following 
metric: suppose that any monomial of degree 2 that appears in P contributes a weight 2° to P (so in particular, P 
always has a contribution of weight 2¢ from Ge --+x'n — we ignore the coefficient), and let the weight of P be the 
sum of all weights from its monomials. We'll then pick P to have the minimum weight. (This is possible because the 
weight is always a nonnegative integer and is always finite for a fixed d.) 


Since P must contain the monomial xe --- xn it cannot be the zero polynomial. So the conclusion of the lemma 


fei 
we proved last time is false, meaning that one of the assumptions does not hold. That's only possible if the degree 
assumption is not satisfied, meaning that there is some 1 < / < n such that the degree of P is at least t; + 1 in 
x;. Without loss of generality, take / = n by reindexing for notational convenience. We then know that there is 
some nonzero monomial cxy" -++x4n that appears in P, such that c € 0 (nonzero coefficient), an > tp +1, and 


ay +--+ + an < d. We'll now get rid of this polynomial by subtracting a multiple of g,: define 


P*(x1, ao Xn) = P(x1, 7 Xn) = ae yas oe ue mma Oc e ar Xa): 
Because g, and P both vanish on all of S;,--- ,S,, so does P*. Also, the second term has degree a; +--+ + a,_1 + 


an — tr -1+ (th +1) =d, so P* is a difference of two terms of degree d and thus has degree at most d. (And notice 
that we have used the fact that a, > t, +1 to have a well-defined polynomial here.) Furthermore, ae +++ xin still 
appears with a nonzero coefficient in P* — it did so in P, and the only monomial of top degree d in the second term is 


a1 An-1 yan—th-1ytrt1 j j j j a1 a ty t 
XP xem ge fet xe? (grabbing the top monomial from g,), which is xj? --- xan A x{1--+x;" because aq > th. So 


P* also satisfies the same counterexample conditions as P. 


But now we claim that P* has lower weight than P. Indeed, the only terms that differ between the two are 


ay An-1 Van—ty—1 = ay 
CXp xe ee" aX, Xp) = XPT x, 


an-1 ee | 
n—-1n n 


th+1 t 
ere xertt $b be xe? be + biX_ + Bo), 


where the coefficients bo,--- , by, are specified by the definition of g, but aren’t important, and this expands out to 


n 


(coefficients in parentheses) 


= (C)xfT xR TMG” (Dig CYT REMAP hee (boc) xg + xa Exh 


But the first term here cancels out exactly with the cx7!---x2""'x2" which we assumed was in P, so the 2% weight 
contribution disappears in P*. For all of the other terms, the weight might be increased or decreased, but the maximum 
possible weight contribution we can have is if we gain back a 27-14 29-24... < 24. So overall, P*'s weight is smaller 


than P, contradicting the minimality assumption. Thus we could not have had our counterexample in the first place, 


and P must take on a nonzero value on some point of Sy; x --- X Sp. 


(This proof could have been rephrased without writing a counterexample by noting that the weight of P gets 
smaller by at least 1 each time, and this might be a more natural way to think about the proof. But it’s easier to write 


things down in this “minimal counterexample” way.) 
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Fact 74 


Another way to state the combinatorial nullstellensatz is that any polynomial P € F[x,,--- , X,] which vanishes on 


all of Sy x--- x S, must be of the form P = hyg, +---+/n»gn, where gjs are as in the proof and hjs are polynomials 
of degree at most deg P — deg gj. (This follows by thinking about the reduction resulting in the zero polynomial.) 
And this is similar to Hilbert’s Nullstellensatz, which states that over an algebraically closed field and with any 
polynomials of the form g1,--- , Gn, if P vanishes on the common zeros of g1,--* , Jn, then PX = hygi +---+hnQn 


for some k, hy,--- , Ap. 


We're now ready to see some combinatorial applications. The following result was posed by Komjath and solved by 
Alon and Furedi before the combinatorial nullstellensatz was stated, but the proof ended up being one of the motivating 


reasons for that formulation: 


Theorem 75 
Let H1,--- , Hm be (affine, (n — 1)-dimensional) hyperplanes in IR”, such that all but one vertex of the hypercube 


{0, 1}” lies in Hy U---U Hm (but the last vertex does not). Then m > n. 


(Notice that we would be able to take m = 2 if we didn't have the constraint that one of the vertices does not lie 


in the union by using the hyperplanes x, = 0 and x, = 1, and that wouldn't be very interesting.) To get equality in the 


result, we can use the hyperplanes x; = 1, x2 = 1,--- , X, = 1, which covers all vertices except (0,--- , 0), or similarly 
we can use the hyperplanes x, +---xX, =1,2,---, 7. 
Proof. Without loss of generality, assume that the origin (0,--- ,0) is not covered, and suppose for contradiction that 


there is a collection of hyperplanes m <n. Each hyperplane H; is described by a linear equation of the form 
hi(X1.0++ Xn) = ai rX1 +++ + ai, nXn — bj = 0, 


To get the union of the hyperplanes, we can multiply the h;s together: if h; vanishes precisely on H; for all /, then 
hy ha +++ Am vanishes precisely on Hy U---U Hm, which contains all vertices of the hypercube except the origin. (In fact, 
hyho+++hm(0,--+ ,0) = (—b1)(—be) +++ (—bm), and we'll call this nonzero value c.) 


We're not ready to apply the combinatorial nullstellensatz yet, but we can consider the polynomial 
P(x, -°- Xn) = hy(x,0 °° Xn) Amo. Xn) = c(1 = X)° (1 — Xp). 


This second term vanishes at all other points of the hypercube except the origin, and because of our choice of c it also 
vanishes at (0,--- ,0). So now we can check the conditions of the combinatorial nullstellensatz — the degree of P is 
n, because there is an x, ---X, monomial in the second term but the first term only has degree m <n. Furthermore, 
setting all t; = 1 and all S; = {0, 1} (having size 2 > 1), notice that xj! --- x!" appears in P with a nonzero coefficient, 
in fact (—1)"*1c, because again the first term hy --- hm cannot contribute to a degree-n monomial. So all conditions 


are satisfied, meaning that there should be some point in S; x --- x S, (the hypercube) where P is nonzero. This is 


a contradiction, so our initial assumption that m < n must be incorrect (and m > n is required). 


15 March 31, 2022 


As a reminder to the graduate students in the class, there is a vote next Monday and Tuesday about whether a union 


will be formed. Professor Goemans has already sent the math department an email of links to resources in favor 
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and opposed to the union, but what's important is for everyone who is eligible to vote. (Also, our third homework 
assignment is due today.) 
We'll continue to talk about some applications of the combinatorial nullstellensatz today. This first result was 


initially proved in 1913, and it’s about additive operations on sets: 


Theorem 76 (Cauchy-Davenport) 
Let p be a prime, and let A, B be nonempty subsets of F,. Let A+ B denote the set {a+ b:a€ A,be B}. 


Then 
|A+ B| > min(p, |A| + |B] — 1). 


If we look at the set of integers instead of Fp, it’s relatively easy to show that |A+ B| > |A| + |B] — 1 (by 
ordering the sets and constructing a set of increasing elements of A+ B). But that argument doesn't work in F, 
because of “wrap-around effects.” It turns out this result is very tight — for example, we can achieve equality by having 
A= {1,---, |A]} and B = {1,--- ,|B]} as long as |A] + |B| < p+1. And of course, we must always have |A+ B| < p 


(because F, only has p elements). 


Proof. Since the result involves a funny-looking minimum, we'll divide the situation into two cases. In case 1, suppose 
|A|+|B|—1 > p, and we want to prove that |A+B| > p—in other words, we must show that A+B = Fp, or equivalently 
that any x € F, is the sum of an element in A and an element in B. Indeed, the sets A and x -B={x—b:be B} 
have a total of |A| + |B] > p elements, so they must intersect and we have a = x — b for some a€ A,b € B, as 
desired. 

Now for case 2, suppose |A| + |B] — 1 < p (this overlaps with case 1 slightly, but it’s good enough for our proof). 
Suppose for the sake of contradiction that |A+ B| < |A| + |B| — 2 < p—1. We wish to apply the combinatorial 
nullstellensatz, and we'll do this by finding a polynomial and a grid on which it vanishes — it makes sense to have that 


grid be A x B, and the polynomial P € F,[x, y] we'll use is 


P(x, y) = Il (x+y—c). 
ceA+B 
The idea is that for any x € A and y € B, x+y will be some element c € A+ B, so P(a, b) = 0 for all a € A and 
b € B. We'll now check the conditions of the combinatorial nullstellensatz: the degree of P is |A+ B|, and we should 
look at the monomial x*y® such that t; + t2 = |A+ B| (maximal degree) and |A] > t,|B| > to. Since we know 
that |A + B| < |A] + |B| — 2, we are indeed always able to find t, and ts that satisfy those conditions (though note 
that depending on the actual degree of P, we might need to choose different values of ty, t2 — we can't always pick 


t; = |A| — 1, t =|B| —1, for example, because |A + B| might be strictly less than |A| + |B] — 2). 


ti+te 
th 


and thus the coefficient is the same as for (x + y)IA+®l = (x + y)h+®, And that coefficient is nonzero, because 


Now the coefficient of x*y® in P is just ( ), because we can't take any c’s for a maximal-degree monomial 


we're working in F, but t; + t2 < p—1 so there's no terms divisible by p here. So all conditions of the combinatorial 
nullstellensatz hold with S; = A, So = B, and there should be a point (a, b) € Ax B with P(a, b) 4 0, a contradiction. 
This finishes the proof. 


Theorem 77 (Chevalley-Warning) 
Let p be a prime, and let Q1,--- , Qm € Fp[x1,--- , Xn] be m polynomials in n variables such that deg(Q1) +---+ 


deg(Qm) <n. Then the number of common zeros of the polynomials Q1,--- ,Qm in ie is divisible by p. 
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(Chevalley originally proved a weaker result, which is that whenever we have a common zero, there must be another 


one as well.) 


Proof. Let A be the set of common zeros: 
A= {(a1,-++ an) EFS Qi(ai.-++ an) =O VI Si < mh. 


We must show that p divides |A]. Suppose for the sake of contradiction that this is not the case — it makes sense 
to construct another polynomial which encodes the property of being a common zero. Taking the product of the Qjs 


would give us the union of the zeros, but if we want the intersection of the zeros we want to do something like 


m 


P*(xX1,-°°* Xp) = II (1 =O 04)++ 3aqyP*) 
i=1 
This polynomial ends up being 1 if (a1,+-+ , an) is a common zero of all polynomials (because it’s a product of (1—0)s), 
and otherwise it ends up being 0 because a?~! = 1 for any nonzero a. Furthermore, deg P* < (p — 1)(deg(Q,) + 
deg(Q2) +---+deg(Qm)) < (p —1)n by assumption. Applying the combinatorial nullstellensatz to this would not be 
very useful, because we already know when P* is nonzero and in fact already know there is a common zero (because 
we assumed |A| is not divisible by p). So instead, we'll consider the polynomial which actually subtracts off the 1s at 


each common zero (a1,--+ , an): 


P(Xte Xn) = PRX Xn) (LO a) (= n= an?) 
(a1, ,an)EA 

Indeed, this second term is what we constructed on our homework: this polynomial vanishes whenever x; # a; for any 
i because our product gets a (1 — 1) factor, and otherwise we get (1 — 0)---(1—0) = 1. So we've now constructed 
a polynomial which vanishes on all of Fo. 

To apply the combinatorial nullstellensatz, we now need to extract more properties of P. We know that deg P < 
(p —1)n, because P* has degree less than (p — 1)n and each term in the sum has degree (p — 1)n as well. We'll now 
take t; =--- = t, = p—1and let each S; be all of Fp. The monomial x?~* ---xP-! now has coefficient |A](—1)"*?, 
because there are no contributions from P* and each term in the sum gives us a (—1)"*+ coefficient. This is nonzero 
by assumption because |A] ¥ 0 in F, (in fact, this tells us that the degree of P is actually (p — 1)n, so it’s valid to 


take t; = --- = t, = p—1), so the conditions of the combinatorial nullstellensatz hold and we have that P is nonzero 


somewhere. This is a contradiction, and thus |A] must be divisible by p. 


Remark 78. An alternative proof that doesn't use the combinatorial nullstellensatz is to notice (after constructing P* 
above) that 


JA}= S25 PR (an, +++, an). 


(a1, ,an)EF2 
But if we look at any individual monomial that might show up in P*, adding it over all of F; will always give us 0 


because of the degree condition (details left to us). Thus |A] = 0 in Fp. 


We'll next discuss the “Erddés-Ginzburg-Ziv constant of F,” — this name will make sense later: 


Theorem 79 (Erdds-Ginzburg-Ziv (1961)) 


Let p be a prime. Then any sequence of elements of F, of length (2p — 1) contains a subsequence of length p of 


sum zero. 
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(This result might look strange, but zero-sum-subsequence problems are a whole field in combinatorics.) The value 
(2p — 1) above is tight — indeed, consider (p — 1) 1s followed by (p — 1) Os. Then any length p subsequence must 


have anywhere between 1 and (p— 1) 1s and thus does not sum to 0 in Fp. 


Proof. Write the sequence as ay,--- , 227-1. We wish to pick “indicator” variables x1,--+ , X2p-1 € {0,1} such that 
Xi +--+ + Xop-1 = pin Z, not in Fy, and xya, +--+ + Xap-142p-1 = O in Fp. We'll basically use these xjs as our 
variables and {0, 1} as our sets S;, and towards applying the combinatorial nullstellensatz we'll consider the polynomial 
(xpay Fee + Mopcatepea) SL, We know that this polynomial is zero whenever our condition x1a,+- --+Xop—142p-1 = 
0 is not satisfied and —1 whenever it is. We also need to require that x1 +--+ + Xop-1 = p € Z, which is a harder 
condition to encode, but we can still say that (x, +---+ Xop—1)P7} — 1 Is nonzero whenever x, +--+ + Xop-1 4 0 in 


F,. So putting this together, we can consider 
P* (x1, sg , X2p—1) = [oaar +.-- + X2p-122p-1)” * _ 1 [Ca fee + Xop-1)P+ — 1] , 


This polynomial indeed vanishes unless x, a1 +--+ + Xop-142p-1 = 0 and x; +--+ X2p-1 = 0 in Fp, and It has degree 
at most 2(p—1). But since 0 < x1 +--++X2p-1 < 2p—1, those conditions hold either if we have a valid subsequence 
of length p or if all xjs are zero. And indeed, P*(0,--- ,0) = (—1)(—1) = 1, so we just need to do something extra 
to P* to allow us to apply the combinatorial nullstellensatz. Specifically, define 


2p—1 
P(x, +++ Xop-1) = P*(xr.-++ X2p-1) — [] A - i). 


i=1 


This is an important detail: normally, we'd use le =), but that makes the degree of P gigantic. But because 


we're only interested in the polynomial vanishing on {0, 1}2°~! eventually, we can reduce the degree dramatically. So 
now P(0,--- ,0) =0, and on the rest of {0, tet P only vanishes when we have a valid subsequence corresponding 
tO X1,°°* , Xop-1- 

So applying the combinatorial nullstellensatz to the polynomial P, setting all t; = 1, and using the sets S; = {0, 1}, 
there must be a point (x1,--- , X2p-1) € {0,1}??? with P(x,--+ , Xop-1) # 0. We've calculated that this point is 
not the origin, so because le; a — x;) vanishes on the rest of {0,1}??-1, this means P*(x1,--+ , X2p-1) #0. That 
happens only If xa; + +++ + Xop-142p-1 = O and xy +--+ + X2p-1 = 0 in Fp, which implies that x; +---+ Xep-1 = pin 


Z because it's not zero and is at most (2p — 1). Thus we have a characteristic vector that encodes our subsequence, 


and we're done. 


16 April 5, 2022 


Last week, we proved a statement about subsequences of (2p — 1)-element sequences in F, (really, Z,). We can 


restate this to a more general problem: 


Problem 80 


Given positive integers m and n, what is the minimum s such that among any s points in Z", one can find m 


points whose centroid is also a lattice point in Z”"? 


We can notice that an s always exists — indeed, asking for m points with a lattice point centroid is the same as 
asking for the sum of coordinates (in each direction) to be a multiple of m. So we can “project down to Z?.” (just 


consider the residue classes of coordinates mod m), and now if we have m copies of any of those m” residue classes 
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of coordinates, we definitely will have enough points. So by the pigeonhole principle, s = (m—1)m” + 1 is definitely 


enough, and we can restate the problem above equivalently with this language: 


Problem 81 


Given positive integers m and n, what is the minimum s such that for any sequence of s (not necessarily distinct) 


elements in Z?,, we have a subsequence of length m whose elements sum to 0 in Z?,? (This s is known as the 


Erdés-Ginzburg-Ziv constant and is denoted s(Z?,).) 


This is the version of the problem we solved last time with m = p,n = 1, and we showed that s(Z,) = 2p—1 in that 
case using the combinatorial nullstellensatz. And the pigeonhole argument above proves that s(Z?,) < (m—1)m"+1 
for any m,n, and in fact for m = 2 this bound is tight (that is, s(Z5) = 2” +1). This is because in characteristic 2, 
the only way to get a subsequence {a1, a2} of length 2 with a; + a. = 0 is if ay = ao, so the pigeonhole principle is 


the only “limiting factor.” 


Fact 82 
Determining these Erddés-Ginzburg-Ziv constants is difficult in general — Erdés-Ginzburg-Ziv proved in 1961 that 
s(Zm) = 2m — 1 for general m (and n = 1), and Reiher proved in 2003 that s(Z?,) = 4m — 3. Also, for m = 2 
(and all m), we know s(Z,) = (2 — 1)2"+ 1. But that’s all of the “infinite families” for which we know the 
answer. 

On the other hand, we have the general lower bound s(Z?,) > (m—1)2"+ 1, because we can always take 


(m—1) copies of the set of all {0, 1}” vectors and that will not be enough to get a subsequence summing to zero. 


To shed light on how these results relate to each other, we have the following: 


Lemma 83 


For all positive integers m,n, k, we have s(Z?,) < k(s(Z?,) — 1) + s(Z2). 


In other words, it suffices to really consider the case where m is prime, because we can then repeatedly apply the 
lemma. For example, plugging in s(Zp) = 2p — 1 gives us the general s(Zm,) = 2m — 1 for all m. (We'll see the proof 
of this on our last homework assignment for the class. ) 

Following these arguments, we've thus basically proved all of the statements in Fact 82 except that s(Z2.) =4m-3. 
To show the lower bound for that, we're basically using the s(Z?,) lower bound above: consider the sequence of length 
(4m — 4) containing (m— 1) copies of (0,0), (m— 1) copies of (0,1), (m— 1) copies of (1,0), and (m-— 1) copies 
of (1,1). Then the only way to get a sum of m things to be zero is if we have a sum (in Z) of either 0 or m in each 
coordinate, but that’s not possible because we only have (m— 1) copies of each of (0,0) and (1, 1). 

For the upper bound, it turns out that Lemma 83 allows us to just prove the statement for m prime. Indeed, induct 


on the number of prime factors of m, and for the inductive step if s(Z2,) = 4m — 3 and s(Z2) = 4k — 3, then 


5(Z2,,) < k((4m — 3) — 1) + (4k — 3) = 4mk - 3. 


So now let’s state the upper bound in a way that doesn't require the definition of s(Z?,). In particular, to avoid 


multisets, we'll switch to Z? so that we can just use different points to represent the same class in Ze: 
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Theorem 84 (Reiher, 2003) 
We have s(Z3) < 4p — 3 for all primes p. In other words, for any prime p > 3 and any subset X C Z? of size 


|X| = 4p — 3, there is some subset [ C X of size || = p such that the sum of both coordinates of the points in 
I is divisible by p. 


(The p = 2 case has already been discussed by Pigeonhole, so we don't need to worry about it.) We'll first do an 


easier case where we're given an extra point to work with: 


Start of proof for |X| = 4p — 2. We'll use the following notation: if we have a subset Y C Z? and an integer k > 0, 
we'll let (k/Y) denote the number of subsets / of Y of size k, such that the sum of the points in / has both coordinates 
divisible by p. For example, (0|Y) = 1 because the empty subset has sum of points 0 in each coordinate, and (k|Y) = 0 
if k > |Y|. Our goal is then to prove that (p|X) > 0 if |X| = 4p — 3 (but for this proof just |X| = 4p — 2). 


Lemma 85 


For any subset Y C Z? of size at least 3p — 2, we have 


1— (plY) + (2p|Y) — (8p|Y) +--- =0 mod p. 


Note that the first term in this sum is just (0|Y), and the sum will terminate because (k|Y) = 0 for large enough 
k. And the reason we're proving such a weird statement is that it’s hard to say anything about a particular (k|Y), 
because the combinatorial nullstellensatz can’t distinguish between sizes of sets mod p. (We got around this with 


s(Zp) = 2p — 1 because luckily (2p — 1) is smaller than 2p, but that was basically a lucky coincidence.) 


Proof of lemma. Write the points in Y as {(a1, b,),--* , (an, bn) } C Z* (where |Y| = n > 3p — 2 by assumption). 
We're interested in subsets / of size divisible by p with coordinate sums of elements in / both zero. Because p is an 
odd prime, notice that the sum >) (1)!!! over all such valid subsets / is exactly the left-hand side of the lemma, 
because all size p,3p,5p,--- subsets contribute a —1 to the sum, and all size 0,2p,4p,--- subsets contribute a +1 
to the sum, just like in the original alternating sum. Thus we must show that >) (-1)!!! is a multiple of p. 

Indeed, any subset / of Y corresponds to an indicator vector (x,,--+ , Xn) € {0,1}”", where x; is 1 if and only if / 


contains the point (a;, bj). With that notation, we're asking for our subset to satisfy 
Xap tess + Xan =O, Xb +++ +Xnbp = 0, X1 +--+ +X, =0 


all in F', so in other words the thing we are trying to compute on the left-hand side (and show is 0 in F,) is 


» (-1)"", 


X=(X1,--° Xn) E{0,1}" 
satisfying above constraints 


where |x| = x; +---+ x, denotes the number of ones in the indicator vector (x1,--+ , X;). To encode this sum more 
explicitly, we want a polynomial Q(x,,--+ , Xn) which encodes whether those constraints are satisfied, and like last 


lecture, we'll use 
Q(%.00* Xn) = (1 = (xpar + + Xp an)P TO) (1 = by + + Xba) A )(L = (4 Fe + Xn). 


This polynomial evaluates to 1 in F, if all conditions are satisfied, and otherwise it evaluates to 0 by Fermat's little 


theorem (because one of the x; sums will be nonzero, so its (p — 1)th power will be 1). But we don’t want to use the 
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combinatorial nullstellensatz on Q just yet, because that’s not going to give us useful statements of the sort we want. 


Instead, we should use a polynomial Q where we're “surprised” if it doesn’t vanish, and much like last time we subtract 


the indicator function of each of the points (x1,--- , X,) corresponding to a subset / satisfying the conditions. For any 
Y =(W1,-+* , Yn) satisfying our conditions, we can define 
dy(X1,+++* Xn) = the unique multilinear polynomial in the variables x1,--- , Xn 
1 x=y, 
such that for all (x1,--- , Xn) € {0,1}", dbyQ4,°-+ Xn) = 


0 otherwise. 


By “multilinear” here, we mean that the polynomial 6, only contains monomials where each x; only has an exponent at 
most 1, and this is fine because we're only going to need our modified Q polynomial to vanish on {0, 1}”. For example, 
if y = (1,---,1), then dy(1,---,1) = x1-++ Xn, and if y = (0,--- ,0), then 6,(0,--- ,0) = (1 — x1)--- (1 — xn). 
More generally, 6, is a product with factor x; if y = 1 and (1 — x;) otherwise. Notice that x, --- x, has coefficient 
(-1)"-"| = (-1)"(-1)! in 6, (because we get a negative sign from each zero in y). With this, we can finally 


construct the polynomial 


PO) +X) = O04,>?  %) — S- by (Xi. 0** Xn) 


y=(M.+ Yn)E{O,1}" 
satisfying above constraints 


This polynomial vanishes on all of {0,1}", and we want to look at the coefficient of x,---x, in P. There’s no 
contribution to that coefficient from Q, because Q has degree (3p — 3) < n by assumption of the lemma. Since we 


get a (—1)"(—1)!”! from each y satisfying the condition, the total coefficient x, - ++ Xp is 


- ps oe ay ate 


Y=(Mre> Yn )E{O,1}” 
satisfying above constraints 


Now assume for the sake of contradiction that this coefficient is nonzero. Then P is a polynomial of degree n 
(because each term in the definition of P has degree at most n and the x,--- x, coefficient is nonzero), and applying 
the combinatorial nullstellensatz with all t; = 1 and all sets S; = {0,1}, we find that P must be nonzero somewhere 


on {0,1}", which is a contradiction. Thus we must indeed have this sum be zero, meaning that 


= S- (=1)"(-1)" =0 in Fy. 


Y=(Vise> Yn )E{O,1}” 
satisfying above constraints 


Dividing by (—1)"*+ gives us the desired result. 


In other words, this lemma shows us that the terms 1—(p|Y)+(2p|Y)—--- together form the vanishing coefficient 


X,-+++Xp in the polynomial P. 


Lemma 86 


Suppose Y C Z? has size |Y| = 3p, and the sum of all points in Y has both coordinates divisible by p, then 


(ply) > 0. 


In other words, if the sum of the coordinates is zero for all of Y, then that property is also true for some p-element 
subset of Y. 


Proof of lemma. Suppose for the sake of contradiction that (p|Y) = 0. If Y’ is any subset of size 3p — 1, then 


(p|Y’) = 0 as well (because if we can’t find any subsets of Y that satisfy the coordinate sum condition, we definitely 
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won't find any in Y’). Then by Lemma 85, we have 1 — (p|Y’) + (2p|Y’) = 0 mod p because Y’ only has size (3p—1), 
and this means that (2p|Y’) = —1 mod p. In particular, there is some subset S of size 2p with coordinate sum zero, 


but then this is a contradiction because we can take the complement S° which will be a set of size p with coordinate 


sum zero as well (since the total coordinate sum is zero). 


Next lecture, we'll actually prove the (4p — 2) bound using these lemmas, but the point is to next go further by 


looking at (3p — 2)-element subsets of our set X of size (4p — 2). 


17 April 7, 2022 


We'll continue the proof from last time, working towards determining s(Z?,) for n = 2. In particular, last time, we 
reduced the problem to the case where m is an odd prime, and our goal became showing that for a subset X C Z? of 


size (4p — 3), there is a subset / C X of size p with sum of both coordinates divisible by p. 


Continuation of proof for |X| = 4p — 2. Recall our notation from last time: for a subset Y C Z?, we let (k|Y) be the 
number of size-k subsets with sum of both coordinates divisible by p. (So our eventual goal is to show that whenever 
X has size at least (4p — 3) (but (4p — 2) in this proof), we have (p|X) > 0.) We proved last time that whenever Y 
is a subset of size at least (3p — 2), we have 1 — (pl|Y) + (2p|Y) — (3p|Y) +--- =0 mod p (using the combinatorial 
nullstellensatz), and from that we proved that for any subset Y of size 3p with sum of coordinates of all points equal 
to zero, we must have (p|Y) > 0. 

We can now do the main proof: for the sake of contradiction, suppose (p|X) = 0. Then for any subset Y of X, we 
also have (p|Y) = 0 (because if Y had a subset with coordinate sum zero in each direction, then X would also have 
that subset). We now claim that (3p|X) = 0 as well — indeed, if there were a subset Y of X of size 3p with sum of 
the coordinates in Y divisible by p, then we must have (p|Y) > 0 by our above lemmas, which is a contradiction. So 
in the equation 

1 — (p|X) + (2p|X) — (3p|X) = 0 mod p 


that we derived in general last time (there are no more terms because 4p is already larger than |X|), we can plug in 
(p|X) = 0 and (3p|X) = 0 to get 
(2p|X) = —1 mod p. 


But we can do even more from here: if we consider any subset Y C X of size (3p — 2), we have (p|Y) = 0, so the 


analogous equation also tells us that 
1—(plY) + (2p|/Y) =0 mod p = > (2p|Y) =-1 (mod p). 


We'll now do a double-counting argument — we count the number of triples (/, YX) modulo p with / C Y C X and 
|,Y,X of size (2p), (3p — 2), and (4p — 2), respectively, such that / has a divisibility condition (but Y doesn't have 


any such requirement). On the one hand, this count is (modulo p) 


2p—2 2p—2 
Qp|X =-— 
cnn(-2) =-(2°2) 
because we pick one of the subsets of size (2p) with the divisibility condition, and then to get Y D | we pick (p— 2) of 
the remaining (2p — 2) elements. (We then use the fact that (2p|X) = —1 mod p from above.) On the other hand, 
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this count is also (still modulo p) 


because we pick any subset Y of size (3p—2) and then count how many (2p)-element subsets will work for that Y (and 
hae 


it’s always —1 modulo p for any Y). So these two quantities should be equal modulo p, meaning that (77) = ( s 


Expanding out the binomial coefficients, this means 


(2p — 2)(2p ~— 3)---(p)(P— 1) _ (4p — 2)---p)Sp—1) 
p! ~ p! 


p. 


It's dangerous to divide by p here when we have a congruence statement mod p, so let’s cancel those out: 


(2p—2)2p—3)...(o +l ip—]) _ ,4p—2)(4p—3)--- p+ Disp— 1) 
(p 1) ~° (p— 1) al 


But now both fractions go through all nonzero residue classes on the numerator and denominator, so they completely 


cancel out, and we're left with 


1=3 mod p, 


which is a contradiction because p is an odd prime. Thus we must have (p|X) 4 0 at the start, as desired. 


Remark 87. /n this proof, even though the congruence statements only required that (p|X) = 0 mod p, we do actually 
use the fact that we're assuming (for contradiction) that (p|X) is exactly zero, because we need (p|Y) = 0 for subsets 
Y of X as well. 


We can notice that this proof does not work for |X| = 4p — 3, because if we try the same size |Y| = 3p — 2, the 
binomial coefficients become Ca) = (eae and came) = Ca and those both evaluate to 0 mod p and we don’t 
get a contradiction. On the other hand, if we try to use |Y| = 3p — 3 instead, our lemma requiring |Y| > 3p — 2 
from the combinatorial nullstellensatz no longer holds. But we're now ready to consider the more difficult case of 


|X| = 4p — 3, and we'll establish a stronger lemma: 


Lemma 88 
For any subset Y C Z? of size |Y| > 3p — 3, we have 


me) Nl Ura ee OP e hy ie 22) ee — 0 Mody: 


In other words, we take integers that are either 0 or —1 mod p in a grouped alternating sum. 


Proof idea of lemma. With the original lemma, we encoded the divisibility condition with a degree (p— 1) polynomial. 
But now that we are allowing for our subsets of Y to have two different size residue classes, we only need a degree 


(p — 2) polynomial to encode that: specifically, change the polynomial Q to 


p-2 
Q(x. Xn) = (L= Oran +++ + Xnan)P)(L = (xa br +++ + Xba)? ) TY (0a +2 +0 + Xn — 5) 
s=1 
(where we should remember that Y = {(a1, b1),-+- , (an, bn) }). So this third factor vanishes as long as the size of our 


subset, encoded by |x|, is not 0 or —1 mod p, which is what the lemma statement suggests. The total degree of Q 


is then (3p — 4), and then the rest of the proof (subtracting the dys and using the combinatorial nullstellensatz) then 


follows identically to the simpler version with |Y| = 3p — 2. 
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Proof of Reiher’s result for |X| = 4p — 3. Again, suppose for the sake of contradiction that (p|X) = 0, so that we still 
have (p|Y) = 0 for all subsets Y of X, and it’s still true that (3p|X) = 0 from the other lemma. Now by Lemma 88, 
for any subset Y C X of size |Y| = 3p — 3, we have 


1—(p—1)IY) — (olY) + (2p — IIY) + (2plY) = 1 — (p— LY) + (2p — 1Y) + (2p|Y) = 0 mod p. 


Last time, we just had 1 — (2p|Y) = 0 mod p, and we added up (2p|Y) across all possible subsets of Y. We'll use 


that same idea here: summing this congruence relation over all subsets Y of size 3p — 3 of X gives us 


C= Ye Ly) S "(2p - 1|Y) +5 (2plY) =0 mod p. 
p ~ . 


But now each of these terms can be written in terms of X: any subset of size (p — 1) of sum 0 in each coordinate 


shows up in (a) = re) different terms of Y (we choose which p elements not in our (p — 1)-element subset 
make up the complement X \ Y), and similar arguments work for the other terms: 


(3-3) - 110) (P?5 7) + @~ 119)(77> 7) + caoixy(?P > *) = 0 mod o. 


Rewriting the first term as Cs we can basically use the same trick as we did in the last proof: for example, 
ne = 3 mod p because the 3p factor cancels with the p when we expand out the factorials, and then the numerator 


and denominator contain all nonzero residue classes. This argument eventually gives us 


3-— 2(p—1|X) + (2p — 1|X) + (2p|X) = 0 mod p|. 


But we also know that for every subset Y of size 3p — 2 or 3p — 1, our original lemma gives us 
1—(plY) + (2p|Y) =0 mod p = > (2p|Y) = —-1 mod p. 


So if we now do another double-counting argument (which might seem unmotivated), where we count the number 
of partitions X = AU BUC such that |A| = p—1,|B| = p— 2,|C| = 2p, and A and C have the coordinate sum 
divisibility constraint (corresponding to the terms (p—1|X) and (2p|X)). To count this, we can first count by picking 
A or by picking B. If we start by choosing A, we have (modulo p) —(p — 1|X) ways to form a partition, because the 
complement of A is always of size (3p — 2), and then within that set we always have (2p|A°) = —1 mod p. On the 
other hand, if we start by choosing B, we have —(3p — 1|X) ways modulo p, because we need B’s complement to 
have the divisibility constraint, and then after that there are always (2p|B°) = —1 mod p ways to choose C. Thus we 
find that 


(p — 1|X) = —(8p — 1|X) mod p = > (p—-1|X) = (3p —1|X) mod p. 


So now if we directly apply Lemma 88 to X, we find (using (p|X) = (3p|X) = 0, and also using the relations above) 


0}=1—(p—1]X) — (p|X) + (2p — 1X) + (2p|X) — (3p — 1]X) — (3p|X) mod p 
=1-(p-1|X) —04+ (2p— 1|X) + (2p|X) — (p -— 1|X) — 0 mod p 
= 1-—2(p—1|X)+ (2p — 1|X) + (2p|X) mod p|. 


But again the boxed conditions tell us that 1 = 3 mod p, so we again get our contradiction. 
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18 April 12, 2022 


We've been discussing Erdds-Ginzburg-Ziv constants for the last class or two — recall that determining s(Z?,), the 
minimum number of points in Z” required to find m points whose average is a lattice point, is an open question in 
general but known for some special cases (like n = 1,2 or m= 2‘). We've mentioned previously that we have some 


general bounds 
(m—1)2"+1< s(Z2,) <(m—1)m"+1,  5(ZPy) < k(S(ZP,) — 1) + 5(Z2), 
which in particular tell us that it’s mostly interesting to consider the cases where m™ Is prime. 


Fact 89 


It turns out that for any fixed dimension n, s(Z?,) grows linearly with m. A linear lower bound is easy from our 


bound (m—1)2"+1 < s(Z?) above, but the first upper bound shown was that s(Z?) < (cnlogg n)"- m for some 


absolute constant c (interesting in the range where m is very large and n is held fixed), shown by Alon and Dubiner 
in 1995. The result has recently been improved — it’s been shown by Zakharov in 2020 that s(Zp) <4". pif pis 
a prime that is sufficiently large with respect to n. (But because those primes can be arbitrarily large, we can’t 
use the recursive bound s(Z?_,) < k(s(Z?,) —1)+s(Z?) to get a clean bound for all m.) We were originally going 


to discuss the proof of s(Z’,) < (cnlogz n)"- m, but we'll move along to stay back on track with the syllabus. 


We can also consider the opposite situation, where we fix m and let the dimension n get large (in which the problem 
is less well understood). Assuming that m is prime (again, this is the most interesting case because of the recursive 
bound), we can think about small values of m. We already know that s(F3) = 2"+-1, and it turns out that computing 


s(Z3) is equivalent to the next topic of our class, the famous cap-set problem and the slice rank polynomial method. 


Problem 90 (Cap-set problem) 


What is the largest size of a subset of F3 without three points on an affine line? 


In particular, notice that every affine line in F3 contains exactly three points, so our question is equivalently “how 
large can we make a subset A C FF3 where A does not contain an entire line,” and in affine geometry “cap-set” refers 


to exactly a set A of this form. An easy lower bound to establish is 2” by using A = {0, 1}” — to see why this set does 


not contain three points on a line, we can verify that x, y, Z are on a line if and only if (x—y)-—(y—z) =x+y+z=0 
(specifically only in characteristic 3), and if we have three distinct points in {0,1}” their sum will not be zero. And 
we also have the upper bound a because if we look at all of the lines through a given point in the set A, those 
lines partition the remaining points in F3 into pairs, and we can only have one of each pair. But even though we have 
exponential lower and upper bounds for this problem, we still don’t actually have a satisfying upper bound because the 


total set of the size is 3”: 


Fact 91 
The best known lower bound is approximately (2.2174---)” (due to Edel in 2004), and the first upper bound 
that tends to a negligible fraction of the whole set was O(3”"/n) (due to Meshulam in 1995). This bound was 


later improved to 3”/(n'*°) for some small constant c > 0 (due to Bateman and Katz in 2012 — even this tiny 


improvement was considered a big breakthrough). And most recently, Ellenberg and Gijswijt showed in 2017 that 


we can actually get a bound of 2.756”, the first exponential base upper bound smaller than 3. 
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Ellenberg and Gijswijt’s result can be stated more generally as follows: 


Theorem 92 (Ellenberg—Gijswijt (2017)) 


Let p > 3, and let AC Ee be a set not containing any nontrivial 3-term arithmetic progressions (meaning that 


there are no distinct x, y,z € F) such that x — 2y + z =0). Then |A| < (T'p)", where 


r= mi Cee ee oe 
Pe otter: £e-DR 


is a constant depending only on p and strictly smaller than p. 


(Notice that this fraction ieee approaches infinity at t — 0, approaches p as t > 1, and Is continuous on 
(0, 1]. But because the derivative is positive at t = 1, we must have a minimum achieved strictly between 0 and 1 — 
the calculus might be a little annoying to check but is not too important.) The proof of this result very importantly 
relies on another result by Croot, Lev, and Pach for Z7, and Ellenberg and Gijswijt independently figured out how to 
modify that result for FP (it was not immediately clear to them). And Tao later reformulated the problem to a more 
general version (this version is now called the slice rank polynomial method); that’s the proof we'll be following in 
this class. All of these developments occurred within a month or so of each other! 

We've seen various “polynomial methods” in this class (such as dimension counting, counting zeros with the joints 
and Kakeya problems, and the combinatorial nullstellensatz). This one is a new one and was novel enough that there 
were cool applications applying it in the months following those developments. To use it, we'll need to first define 


what “slice rank” means: 


Definition 93 

Let F be a field, A be a finite set (indexed in any way), and k > 2 be an integer. The slice rank of a func- 
tion f : A* > F is defined in the following way: f has slice rank 1 if it can be written as f(x,,--- ,Xx) = 
g(x) A(X) Xa, X41,°°* Xk) for nonzero functions g: A— F and h: Ak-1 _, F and some 1 < j < k. Then 
the slice rank of f is the minimum r such that f can be written as the sum of r slice rank 1 functions (with 


potentially different js). 


Remark 94. A function f : A — F is also called a k-tensor — that word sometimes has scary connotations because 
of commutative algebra, but we can also think of it as a “hypermatrix” because, for example, a function f : A? > EF 


can be represented as an |A| by |A| matrix which just encodes the numbers f (i, /). 


Example 95 
The slice rank of a function f : A* > F is always at most |A| (this is kind of like how an n x n matrix can only be 
of rank at most n). To see this, look at the “horizontal slices” of our hypercube AX — letting 6, be the indicator 


function for x; = a (for any a € A), we have 


F = 7 bala) F(a, x2, 1%), 


acA 


where each term in the product ts of slice rank 1. 


Note that the slice rank is not the same as the ordinary tensor rank (in which a rank 1 tensor looks like 


91(X1)g2(X2) +++ 9k(Xx)). This means that the slice rank is always smaller than the tensor rank, but the most general 
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upper bound for the rank of a k-tensor is |A|*. On the other hand, we can check that for k = 2, the slice rank is 
the same as the ordinary rank of a matrix. Remember that diagonal matrices with nonzero entries have rank n, we 


have an analogous result for our functions: 


Lemma 96 (Tao) 


Suppose f : AX — F is such that f(x1,--- , xx) #0 if and only if x, =--- = xx. Then the slice rank of f is |A]. 


Proof. We induct on k. For k = 2, this result is equivalent to the fact that diagonal matrices with nonzero entries 
have full rank. For the inductive step, because the slice rank of any function is at most |A], we can assume for the 
sake of contradiction that the slice rank is less than |A]. Then we can write f as a sum of fewer than |A| slice rank 1 


functions 


Fa Xe) = S2 ga(%)Ratres vay + SD Gate) Ra Xi X30 °° 1%) Bee SD gale) Mar Xt Xen) | 
aeM, aeM2 aeM, 


where Mj are disjoint index sets with |M;| +---+|M,| < |A| (in other words, each slice rank 1 function in the sum 


is of the form go(Xj)ha(X1,°+* ,Xj—1, Xj41,°°* Xk) for some Jj, and so we index the functions of this form with this 


particular j by M;). So now consider the space of all functions @ : A > F such that 


S © (x) ga(x) =0_ for all a € My; 

xeEA 
in other words, we want our functions @ to be orthogonal to all of the functions of the form ga(xk)ha(X1,°+* + Xk-1) 
that show up in our sum for f. This space has dimension at least |A|—|M,| (we have at most |Mx,| linearly independent 
constraints on @), so there is some subset A’ of size at least |A| — |Mx| and some @ such that $(x) is nonzero for 
all x € A’. (This is because $3 <4 O(x)9a(x) = 0 can be thought of as linear equations in the variables @(x) — row 
reducing and taking A’ to be the set of free variables, we can let @ be the function which takes value 1 on all free 


variables.) Now defining the function f’ : (A’)‘~-! > F via 
(xis Xka1) = DO FOa. xe) OO), 
XKEA 


this function is indeed diagonal because f is diagonal, and it has nonzero entries on the diagonal because f is nonzero 
on the diagonal and $(x,) #0 on A’. Thus by the inductive hypothesis, the slice rank of A’ is at least |A] —|Mx|. But 


on the other hand, multiplying the boxed equation by @(x,) and then summing over all x, € A gives us 
F(x Xe) = D2 gal) (x Pee(X21°* xpAte)) tet SO galxk-1) » he (X1,+*° a-amybte)) 
aceM, XKEA aeMy_1 XKEA 


(the last term vanishes by the definition of ¢). So the slice rank of f’ is |A’| > |A| — |M,| but is also at most 
|M,| +--+ + |My,_1]|, giving us a contradiction because we assumed that |My] +---+|Mx| < |Al. 


Next lecture, we'll see how we can use this lemma to prove Ellenberg and Gijswijt’s result! 


19 April 14, 2022 


We'll be proving Ellenberg and Gijswijt’s result today — recall that it states that a subset A of F} that does not contain 


i a A ‘5 é fae p-1 a " 
a nontrivial 3-term progression satisfies |A] < ([p)", where Fp = minoceca sts. As we discussed last time, 
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the special case p = 3 Is the cap-set problem, and this result gives us an exponentially better bound than what was 


previously possible for the problem. And for some numerical reference, it turns out 0.8414p <T, < 0.9184p. 


Fact 97 
Recall that a “nontrivial three-term progression” means that we have three distinct elements x, y, Zz with x—2y+z = 
0. But the “wrap-around” nature of IF) for restricting three-term progressions is very important here — there are 


subsets of {1,--- , N} in Z without a three-term arithmetic progression of size at least em cv OIININIESO looking at 


these sets in comparison to the ground set, in the FF} case we know that |A| must be at most size (p") = for 


some constant Cp, but in the Z case we can have |A| of size (N)!~°). 


We'll be using the slice-rank polynomial method to prove this result — recall that a function f : AX > F has slice 
rank 1 if we can write it as a product f(x,--- Xx) = gOg)hAQa.- ++ Xj-1, X41.°°* Xk) for some j, and generally a 
function's slice rank is the minimum number of slice rank 1 functions that must be added together to get f. We proved 
last time (by “horizontal slicing”) that the slice rank is always at most |A| and that it is exactly |A] for a “diagonal” 


function f (in which the values f(x;,--+ , xXx) are nonzero only when x, = --- = Xx). 


Proof of Ellenberg—Gijswijt's result. Suppose we have such a set A. For any point x € A C F?, we'll write x = 
(x, --- ,x() with upper indices to avoid conflicting notation. We need to construct a “tensor” (function), and 
specifically we'll do so with our set A which contains no nontrivial three-term arithmetic progressions: define the 


function f : Ax Ax A — Fy given by 


n 


F(x,y.z) =|] (« —2y') 4 2i)ye-t 1) 
i=1 
Because x, y, Zz form an arithmetic progression (in that order) if and only if x —-2y + z = 0, this is the same as saying 
that x? — 2y +z = 0 for all i, and then we’re playing the usual game with the combinatorial nullstellensatz — if 
any of these coordinate calculations are nonzero, the expression will be 0, but if we do have an arithmetic progression 
it will be (—1)". 

But by assumption A has no nontrivial 3-term arithmetic progressions, the only way for f to be nonzero is to have 
a 3-term arithmetic progression with a repeated term, and because p > 3 this only occurs if x = y = z (we have to 
be careful here — something like 1,0, 1 would work in p = 2!). In fact, f(x, y, Z) is indeed a “diagonal” function of the 
sort we talked about, because f(x, x, x) = (—1)" for all x € A and otherwise f(x, y, z) = 0. Thus f has slice rank |A| 
by our lemma. 

It may not seem like we've made very much progress, but now the polynomial nature of f will give us an upper 
bound for the slice rank. Specifically, because f does not have a very high degree (at most (p—1)n), we will be able to 
write f explicitly as a sum of slice rank 1 functions, which will be how we get the bound on |A|. In particular, a degree 
(p — 1)n polynomial is a sum of monomials of degree at most (p — 1)n in SO ae ts Oh ON cee 
such that any individual variable has degree at most (p — 1). Each monomial has degree distributed between the x’s, 


y's, and z's, so it must have degree at most {priya in either the x's, y's, or z's. Explicitly, the ones where the degree is 


at most {erin in the x's look like (x())% - -- (x())% times a monomial in y and z (where dj +--- +d, < (e-Un), and 
then there are similar expressions for the groups for y’s and z's. And the idea is that we’ll group (by the distributive 
law) all of the terms with the same djs in x, and also do the same grouping among the same djs in y's and the 


same djs in z's. This means that we can write f as a sum of functions of the form 


(x)... (6) . (polynomial(y, z)) , 
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plus also similar expressions (y))% --- (y(™)4 - (polynomial(x, z)) and (z())# ---(z()% . (polynomial(x, y)). And 
the point is that these are valid slice rank (at most, in case the polynomial is zero) 1 functions — these terms are 
each a function of x times a polynomial in y and z, or a function of y times a polynomial in x and z, or a function 
of z times a polynomial in x and y. (Importantly, we couldn't have broken this up into x times a polynomial in the 
other coordinates, because that wouldn't be separating out x from the other variables.) So overall, we find that the 
slice rank of f can be bounded as 


(p—1)n 


|A| = slice rank f < 3}{(dy,--- , dp) € {0,---,p—1}": da +---+da,< 3 


since we get a slice rank 1 polynomial for every valid set of exponents d;. And now the rest is computation (we're 
going to lose a factor of about ,/n to get to the nicer-looking (Ip)”, but it generally isn't seen as very important) — 


we'll show that |A] < 31% with the following lemma: 


Lemma 98 


Pick a uniformly random n-tuple (d,,--- , d,) € {0,--- ,p—1}". Then the probability that d, +---+d, < een 


is at most ([p/p)”. 


Proof of lemma. Since Tp is the minimum of +#=4"" for t € (0,1), it suffices to prove that the probability is at 


most that fraction for any t. Indeed, fixing some tf, 


because t < 1, and then by Markov’'s inequality we can bound this by 


Z B[ tat tan] 7 E[t*]" — (+ttet+ ames Von We — (ltt 4+ pe-1\" 
— — ¢(p—1)n/3 (t(P-1)/3)n t(p-1)/3 p t(e-1)/3 


Since this holds for any t € (0,1), it holds for the minimum of all such t, which gives us 


p—1)n 1 
(dt td< 20) <hrg 
as desired. 
In particular, this means that the fraction of all p” possible n-tuples (d,,--- , d,) that are valid is at most oe Xe) 


there are at most ([,)” terms for each of the x, y, and z slice rank 1 functions, giving us |A] < 3(Tp)” overall. To 
remove the factor of 3, the idea is to now use the power trick: notice that if A is a 3-term progression-free subset, 
then A™ is also 3-term progression-free (where we're thinking of A” as isomorphic to Fy”, so we're avoiding 3-term 


arithmetic progressions for (nm)-dimensional vectors). Thus applying the bound we have already proved, we find that 


JAI" < 3(Tp)"™” => JA, <3/™(F,)", 


and taking m arbitrarily large (taking the infimum) implies that |A] < (F,)”, completing the proof. 
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Fact 99 
There is still a gap between the currently known lower and upper bound, and in particular it’s not actually known 
whether this upper bound is tight. And even though the Markov bound in our lemma might look weak, it turns 


out it’s not necessarily a bad bound. In particular, a generalization of this problem (tri-color sum-free theorem) 


is actually tight with its corresponding I, — the argument there involves coupling the probability distribution with 


itself. (And there’s a k-colored version of all of this too, in which the bound is also tight — this was shown in a 


paper coauthored by Professor Sauermann.) 


Notice that Lemma 98 is essentially a Chernoff bound, because we expect d; +---+d, to be ae on average. 
So that is the point at which we saw why we needed the polynomial f to be of low degree — we needed it to have 
degree smaller than 3(p —1)n. But that also explains why this proof method does not work for looking at 4-term 
progressions, which essentially require two equations in four variables (such as x — 2y+z=0,y—2z+w=0). Then 


we'd have a polynomial f(w, x,y, Z) of degree 2(p — 1)n, but then the slice-rank splitting would have four different 


types and each having dy +--- +d, < rn which is not good enough. More generally, m equations in k variables 
will only give a meaningful bound if k > 2m-+ 1, so trying to do k-term arithmetic progressions will never work this 
way for k > 4. 

To be more precise with all of this, suppose we have a single equation in k > 3 variables of the form a,x,+---+axxX, = 


0 (which is the equation we're trying to avoid among points x; € A). In order to use this proof method to bound |A|, 


we must have | a; +--- + a, = 0] (otherwise the set A C F) of all points with first coordinate 1 will not have any 


solutions to a,x, +---+ a,x,, and it has size =p”) — this comes up in the fact that our tensor f : AX + F must be 
diagonal and have nonzero entries on the diagonal. And there’s another caveat as well — instead of requiring x, y, Z 


to be all distinct, we must require them to not be all equal (so we have a stronger set of conditions): 


Theorem 100 (Generalization of Ellenberg—Gijswijt) 
Let p be a fixed prime, let k > 3, and let a,--- , ax € Fp \ {0} such that a; +---+ a,x = 0. Suppose A C ne 


is a set such that there are no points x ,,--- , xx, € A with a,x, +---+ axx, = 0 with the xs not all equal. Then 


Sit Itt+-.+1°74 
|A] < (Fp.«)”, where Tp.¢ = Minoctci ye — Pe 
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Last lecture, we mentioned that Ellenberg and Gijswijt’s result about progression-free subsets of F7 (which we proved 


last time) generalizes a more general problem. Specifically, if A C F) is some set where there is no set of not-all-equal 


elements x1,--- , Xx € A satisfying a,x, +---+ axx, = 0 (where k > 3, a; are all nonzero, and a; +--- +a, = 0), then 


é A ioe pl 3 rm 
we can bound the size of A as |A| < (Fp.«)”, where Tp.k = minocte1 Lae is some constant independent of n 


and (importantly) strictly less than p. (So we have an exponentially strong bound on |A| which in fact gets better for 


larger k.) Ellenberg and Gijswijt’s theorem is this result with p > 3, a; = 1, ag = —2, ag = 1, and the way to prove the 
more general problem is with essentially the same proof as last time (using the slice rank polynomial method). And 
we can in fact further generalize this result to systems of equations by taking multiple sets of equations of the form 
aX, +--+ + agx~ = 0, as long as the number of variables k is strictly more than twice the number of equations m; 


then, instead of t-))/*, we have t’™?-1)/K in the denominator for [p,x. 


Remark 101. We mentioned that our proofs didn’t tell us whether the exponent basesTp andl px are tight but that 
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there was a similar situation in which the exponent base is known to be tight. The actual setup for that Is listed below, 


but we won't focus on this too much. 


Theorem 102 (Multi-colored sum-free theorem) 


Fix a prime p and let k > 3. Consider a list of L k-tuples, indexed as (yie.--- . Ye) € FR Xx +--+ x FO for 


£e€ {1,---,L}. Suppose that for any 2;,---,£, € {1,--- ,L}, we have yz2, +---+ Yee, = 0 if and only if 
£) =---=£,. Then L < (Tp,«)” for the same I'p,x as above, and in fact we have a tight bound (there is no better 


constant than [p,x; a lower bound of (p,¢)"~O(v is known). 


In other words, we can write out our k-tuples, placing (¥i1,°++ ,¥21,°°* »Y«,1) in the first row, (Y1,2,°°° ¥2.20°°+ »Vk.2) 
in the second row, and so on. The result then says that the sum of any row is zero, but there's no other way to 
take one element of the first component, one element of the second component, and so on, and have them add to 0. 
(Think of each component of its own color.) And it turns out that to recover our original generalization, we correspond 
every x € A with the k-tuple (a1x,--+ , axx). 

With that, we can return to the topic of Erdds-Ginzburg-Ziv constants from earlier. If we take k = p and all aj = 1 


in our generalization of Ellenberg—Gjiswit, we get the following result: 


Theorem 103 


Let p > 5 be a prime (the bound is otherwise not interesting), and let A C ln be a subset that does not contain 


X1,°°* ,Xp € A, not all equal, with xj +---+ xX, =0. Then |A] < eae 


(We also get a corresponding lower bound of 2” from {0, 1}” as usual.) 


Proof. Given the work we've done already, we just need to verify that T'p,, < 4. Indeed, 


1T+---4+ tet 1/(1—t 
eT ae ge Mt) oy 


Cop, = min ———.—— 
PP oct<1 — ¢(P-1)/p — 0<t<1 t 


by replacing the numerator with the infinite series and noting that t?-))/P > t for 0 < t < 1, and then finally noting 
that t(1 — t) has a maximum at t = 1/2. 


Recall that we defined s(Z?,) to be the smallest s such that any sequence of s elements in Z?, has a subsequence of 
length m whose elements sum to 0 in Z?,. We mentioned that a motivating question was to understand the behavior 
of this constant when we fix m and take n to infinity, and in particular it suffices to understand the behavior when m is 
prime. We've now managed to almost convert our results back to that language. However, the issue with immediately 
generalizing is that in Theorem 103, we only assume that X),--- , Xp are all not equal, while in Erdds-Ginzburg-Ziv we 
assume that they must all be distinct. (And the slice rank proof required our “diagonal” function to have extremely 
few nonzero entries, so that proof will not work directly here.) 

Furthermore, in Erdds-Ginzburg-Ziv, it's okay to “repeat” elements of ie while we haven't been allowing that in 
our generalization of Ellenberg—Gijswijt. But that’s an easier complication to deal with. For that, let's define S"(Z,) 
to be the maximum size of a subset A C Fe in which we forbid solutions x; + +--+ Xp = 0 with x; all distinct (so it’s 


like s(Z5), but this time we allow repetition of xs). Then we have 


s*(Z8) +1 < s(22), 
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because the set A in the Erdds-Ginzburg-Ziv setting shows that we can’t forbid a sum of p things adding to zero for 


(Zp) so the best value for s*(Zp) is at most s(Z>) +1, and we also have 
(ZB) < (p~ 1)s*(Zp) +1 


because if we have a vector with p repetitions that gives us a subsequence and we're happy in the Erdds-Ginzburg-Ziv 
setting, and otherwise we have at least s*(Z>) + 1 different vectors and thus there are p of them that are distinct and 
sum to 0. Thus s*(Z}) and s(Zo) differ by a factor of at most p, so up to a constant factor in the fixed p regime 
these are basically the same. 

So putting this together, we've already proved that s(Z5) = 2"+ 1, and from Ellenberg—Gjiswijt and the argument 
we just made, we have s(Z3) < 2([3)"+1 < 2-2.756"+1. (We're lucky here because “x, x2, x3 sum to zero and 
are distinct” is the same as “x1, Xo, X3 sum to zero and are not all equal.”) Now for a prime p > 5, we want the largest 
possible size of a subset A C ae without distinct x1,--- ,X» € A summing to zero. Here’s the historical rundown: in 
2017 (but published in 2020), Naslund proved that |A] < p-2?-(I,)” (remember we can think of I’, as roughly 0.9p), 
by introducing the concept of partition rank. But then a better bound was proved, for which we can go over the 


proof now: 


Theorem 104 (Fox-S. (2017)) 
If A CF does not contain p distinct vectors with x1,--* , Xp € FZ with x, +--+ + x» =0, then |A] < 3([,)”. In 


other words, using the argument above, we have s(Z?}) < 3p(Tp)”. 


In fact, Professor Sauermann has more recently improved this bound to |A] < on ere < Cp(2,/p)", which 
has base now much better than linear in p but still far away from the 4” behavior in Theorem 103. And that bound 


generalizes to the multi-colored version too and is tight there. 


Proof. The idea is to first ask “how often a given element of A can be the middle term of a 3-term arithmetic 
progression” and reduce to the generalization of Ellenberg-Gijswijt. In particular, if there is some x € A which appears 
as the middle term of Boh 3-term arithmetic progressions in A, then the p vectors appearing in these progressions are 
p distinct vectors that add to p- x = 0 (in particular the arithmetic progressions only overlap at x). So every x € A 
appears as the middle term of at most p33 3-term arithmetic progressions, so there are at most |A| - p33 nontrivial 
3-APs in A. (As a note, for p = 3 any element can be the middle element of the progression, so for that case we can 
just fix a choice of middle element at the start.) 

Now let H be a uniformly random affine hyperplane in FZ, and let X; = |AN H| and X> be the number of nontrivial 


3-term arithmetic progressions in AM H. We have 


1 
[Xi] = =|A 
eel = 


(because every point in F, shows up in H with probability ah and we have 


1p™t-1 = ; . : : 1 
a [X2] = ee - (number of nontrivial 3-term arithmetic progressions in A) < = 
(where we're finding the probability that two of the points in any given arithmetic progression are in A, at which point 


the third will also be). Thus by the triangle inequality, we have 


1 
A 
=5IAl 


1 
i[X1 — Xa] > ma 
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so there is some hyperplane H in which X; — X2 > aplAl- Now construct a subset B of AN H in which we delete one 
point from each nontrivial 3-term arithmetic progression in AM H; since we have Xz points and we delete at most Xo, 
|B) > X, —X2> HIAl- So we can apply Ellenberg—Gijswijt now on H (viewing it as isomorphic to pia noting that 
affine translations preserve arithmetic progressions), and we find that because B is a subset of H without a nontrivial 
3-term arithemtic progression, 

=A < |B] <r? => JA) < 2pre}. 


Since [, > 0.84p > 3p, this can be rewritten as |A| < 3(T,)”, as desired. 


On our homework assignment, we'll see a problem in which we forbid configurations where a point is the center of 


k different arithmetic progressions, and we'll prove a similar result using a probabilistic argument. 


21 April 21, 2022 


We'll discuss the Erdés-Szemerédi sunflower problem today: 


Definition 105 


Three distinct sets A, B,C form a sunflower if AN B=ANC=BNC. 


In particular, it’s okay if A, B, C are all disjoint (and it’s okay if one of the sets is an empty set as long as the other 
two are disjoint), and otherwise they form a “sunflower” (not really) shape in which the only intersection between any 
of the sets A, B,C is common between all three. The natural question to form from this is the following question 


(essentially forbidding certain patterns): 


Problem 106 


What is the maximum possible size of a collection of subsets of {1,--- , m} (any ground set of size n works), such 


that F contains no three distinct sets A, B, C that form a sunflower? 


A trivial bound is 2” (that’s the total number of subsets of {1,---,n}, and the Erdés-Szemerédi sunflower 
conjecture conjectures an exponentially better bound, namely that there is some constant c < 2 such that we must 
have |F| < c”. And in 2013, Alon, Shpilka, and Umans wrote a paper studying connections between problems like the 
sunflower problem and fast matrix multiplication (understanding how to multiply nx n matrices in time less than O(n?), 
motivated by the existence of certain combinatorial structures), in which they showed that the conjecture follows 
from the tri-colored sum-free theorem in F3. And after Ellenberg and Gisjwijt published their paper, it was seen that 
the proof there applies to the tri-colored sum-free theorem, proving the Erdés-Szemerédi sunflower conjecture. (That 
first appeared in print in another paper by seven different authors, discussing yet something else — the attribution of 
this result is generally a little complicated.) 

However, it’s interesting to think about whether there is a more direct way to apply the slice rank polynomial 
method and apply it to this problem. The answer is yes, and in fact we can improve the constant c with this direct 


approach: 


Theorem 107 (Naslund—Sawin, 2017) 
Let F be a collection of subsets of {1,--- , n}, such that no distinct subsets A, B, C € F form a sunflower. Then 


|F| < 3(n+ 1) ese (2) (which is at most 1.89” for large enough n by a Chernoff bound calculation). 
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The idea is to prove a version of this problem where all subsets in F are of the same size — since there's only (n+ 1) 
possible sizes and we're dealing with bounds that are exponential in n, this is really a lower order term. Specifically, 


we'll first prove the following result: 


Proposition 108 


Let F be a collection of subsets of {1,--- , n} all of the same size, such that no A, B, C form a sunflower. Then 


IFIS 3 Dees ({): 


This implies the theorem, because we can consider each of the (7 +1) possible set sizes separately, and if we have 
a bound 3 ike (2 on how many subsets of size s we can have, then the total number of subsets of {1,--- , n} (for 
which s € {0,--- , m}) can only be at most (n+ 1) times this bound. (Notice that this bound is only really useful if 
the set size is between 3 and B — it makes sense to ask whether we can still get an interesting bound if the set size 


Ll 


is, for example, but there isn't an immediately clear answer.) 


10° 
Proof. We'll fix some notation: let F = {Ai,---,Am}, where each A; is a subset of {1,---n}. We wish to find a 
bound on m, and our first step is to find a polynomial on which we can apply the slice rank polynomial method. Let 
X1,°** ,Xm be the indicator vectors of Ai,--- , Am (meaning that x) = 1 if J isin the set A;, and x) = 0 otherwise). 


Our condition that F has no sunflowers is equivalent to saying that we cannot have Aj Aj = A; N Ay = AVN Ax. 
However, notice that that intersection condition is the same as saying “any element is in either 0, 1, or 3 of the sets 
Ai, Aj, Ax: 

In other words, for any distinct /,/, k, we require that there is some element s € {1,--- , n} that is in exactly two 
of the sets Aj, Aj, Ax, meaning that x) + a + xis) = 2. Furthermore, if i = j = k, there will not be any s such 
that xi) +x!) + xf) = 2 (it’s always going to be 0 or 3), and if two of /, j,k are the same (without loss of generality 
let's say | = j # k) then there will be some element s in Aj = A; but not in A, (here’s where we use the fact that 


the sets are of the same size), so that x‘) = a = 1 but x") = 0, so again x!) + es +x{S) = 2. Putting this 


together, our sunflower-avoidance means that we can always find an s such that x!) + ei + xf) unless / = Jj = k, 


and that now looks a lot like a diagonal condition. 
Motivated by this, we'll define a function f : {1,--- ,m} x {1,--- ,m} x {1,---,m}— R (any field here works as 


long as it's not of characteristic 2) such that 


n 
(ijk) = TT] Ke +x'9) + xfs) — 2) 


s=1 


This is a degree n polynomial (good because it means we can use the “polynomial splitting” trick from our previous 
proof, and also good because the degree is not too large), and from our discussion above we can check that f(i, J, k) 
is nonzero if and only if i = / = k (that’s the only case where our product has no factors of zero). So by Tao’s lemma 
for the slice rank of a diagonal function, the slice rank must be the size of the index set, which is m. 

As alluded to, we'll now get a bound on the slice rank by splitting up the polynomial into slice rank 1 functions. 


Since the polynomial that defines f has degree n, and each variable (meaning the components of xj, xj, or Xx) has 


degree at most 1. If we multiply out the product for f(/,/,k), then for each monomial that appears, one of the x) 


variables, the xe variables, or the x) variables have sum of degrees at most 3 by the pigeonhole principle. Now just 


like last time, put each monomial in the corresponding group, and then (by combining terms) we can write f as a sum 


of terms of the form 


(1)) ch a 


(x; - (x6) . (polynomial in x;, xx), 


(1) 


(1) yah . - (polynomial in x;, x) and (xp7)% ++ (xt 


plus similar terms (x; «= (xi) ™) dn - (polynomial in x;, xj), such that 
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dy + +++ + dy < 3 and with each d; € {0,1}. Each of these terms is then a slice rank 1 polynomial (because it’s a 


function in x; times a function in xj and xx, or similar), so the slice rank must be bounded by (just counting the number 


of terms) 


m = slice rank(f) < 3+ { (dh, ++ idn) dp bees + dy < a dj {0,1} }, 


since the slice rank is the minimum number of slice rank 1 polynomials required. And this is exactly 3 > ,cn ey since 
=3 


whenever d; +---+d, = k there are C) ways to pick k of the djs to be 1 and the others to be 0. 


Notice that it’s important in this proof that the sum pee ie) isn't summing over k < ai for example, or else 
=3 


we won't get an exponentially good bound. And this detail is very similar to the Ellenberg—Gijswijt proof (in fact this 


proof is slightly cleaner). 


Fact 109 


A known lower bound for the sunflower problem, according to the Naslund—Sawin paper, is approximately 1.55”, 


but this is an unpublished result by Naslund (along with lower bounds for the cap-set problem). 


We'll now briefly mention another famous sunflower problem variant, the Erdd6s-Rados sunflower problem: 


Problem 110 


Fix some integer k. What is the maximum size of a family of sets of size k (with no restriction on the ground 


set), such that F does not contain three distinct sets A, B, C forming a sunflower? 


We do need some restriction on the size of the sets — otherwise, we could use the sets Aj = {1,2,--- ,/} and 
form an infinite set with no sunflowers. So the two natural restrictions are on the ground set and on the sizes of the 
sets, and we've considered both here. In this version, it’s not immediately clear that this maximum size is finite, but it 
does turn out to be (by a combinatorial argument we won't talk about here). We'll instead say a bit about the known 
bounds that were established — it was conjectured by Erdos and Rado that |F| < c* for some absolute constant c, and 
the best known bound is recently by Alweiss, Lovett, Wu, and Zhang (2019) that |F| < (clog kloglogk)*, greatly 


improving the previous bound in which the base of the exponent was polynomial in k. 


Remark 111. We've been discussing sunflowers of size 3, but many of these arguments can work if we have £-sunflowers 
(sets of £ sets such that all pairwise intersections coincide). And if we try to take sunflowers with more sets at once, 
the picture looks more like an actual sunflower with petals. The problem then becomes harder, and the slice rank 
polynomial doesn't work anymore and no exponentially good bound is known. But it's also worth mentioning that in 
the Erdos-Rados sunflower problem, that proof does work more generally and just gains some factors of £ in the base 


of the exponential. 
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In these last three lectures (as voted on by the class), we'll discuss lower bounds for extermal numbers of bipartite 
graphs (via randomized and algebraic constructions). To get in the mood for these kinds of questions, we'll start with 


a classic problem from more than a century ago: 


Problem 112 


What is the maximum number of edges that an n-vertex graph can have without containing a triangle? 
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This is a classic problem that we may have seen before — the answer is Ear obtained from a bipartite graph with 
L5] vertices on one side and [5] on the other. We won't go over the proof here, but it's something we can search up 


— instead, we'll generalize the problem: 


Definition 113 


Let H be a fixed graph. For any n, let ex(n, H) be the maximum number of edges of an n-vertex graph without a 


copy of H as a subgraph. 


(For example, if H is the triangle graph, then ex(n, A) = [=]. And it’s important that we're asking H to not be 
contained as a subgraph, rather than as an induced subgraph — after all, if H is not a clique, we could just take the 
complete graph on n vertices and we would not have a copy of H as an induced graph.) 


Our first result gives us an asymptotic fraction of edges that we may include as n gets large: 


Theorem 114 (Erdés—Stone—-Simonovits) 
For any fixed graph H, let x(H) denote the chromatic number of H (the minimum number of colors needed to 


color the vertices of H such that any two adjacent vertices have different colors). Then 
1 n 
A= |S aly 
e(ni)= (1 z=) (3) +207 


To get some intuition for this result, the idea is to split our n vertices into x(H) — 1 equal-sized parts, and make 


our graph empty within each part and complete between the parts (so this is a “blow-up” of the clique on x(H) — 1 
vertices). That graph can be colored with x(H) — 1 colors, so it cannot contain H (which requires x(H) colors to 
properly color). We won't prove this result because it’s not directly related to our goal, though. 

This result may look like it gives us a good sense of the number of allowed edges, but in the case where H is 
bipartite (and thus the chromatic number is 2), we don’t actually get an asymptotic behavior — in that case we just 
know that ex(n, H) = 0(n?) and the problem is in fact still open. In fact, even for the most simple bipartite graphs, 
the complete bipartite graphs K;¢ (in which we have s vertices on one side, t vertices on the other, and the only edges 
are the st edges between the two sides), the problem is very difficult and still mostly open. 


We'll discuss this case H = K+ in class today, though, starting with an upper bound: 


Theorem 115 (Kovari-Sos—Turan (1954)) 


Fix s < t. Then there exists a constant C, + such that for all n, we have 


ex(nien) = Geena. 


In particular, this answer is indeed always o(n*) (as predicted by Theorem 114), and we get a weaker bound for 


larger s (which makes sense, because larger s means we have a larger subgraph to avoid, which is easier to do). 


Proof. To show this upper bound, we must show that any graph G on n vertices that does not contain K;; has at 
most C,;n?~!/5 edges. We'll count the number of (s + 1)-tuples (%1,--- ,Xs,y) € V(G)s+!, such that each x; is 
connected by an edge to y (in other words, (x;, y) is an edge). Call this number N. We know that 


N= SO [fyeV(G):(%, y)€ E(G) V1 <i <n}, 
(%1,7+ Xs )EV(G)S 
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and now we can break this up into two cases: if the x;s are all distinct, then we cannot have more than (t — 1) options 


for y, or else we'd contain a copy of Ks, and in general we can only have n vertices in the graph, so this is 


= S- (t-1)+ S- nsotte(S)ne tons (t+ s)0P 


(Oa, Xs )EV(G)S (x17 Xs )EV(G)S 
distinct not distinct 


(crudely bounding both terms, with the latter bound (5) ns—+ coming from choosing two indices to be equal and doing 
a union over all possible choices). On the other hand, we may count WN by starting off with y (at which point each of 


the xjs can be any of deg(y) choices): 


1 
N= )) (degy))®=9-— D7 (deg(y))*, 
yeVv(G) yev(G) 
and now we make use of the convexity of the function z ++ z® through Jensen's inequality (this is also the power mean 


inequality): 


Ss ‘Ss. 


>n-{> S> deg(y)} =n'*{ SO deg(y)) =n #Q1E(@)). 


yeVv(G) yev(G) 


Putting our inequalities together, we find that 


2 nS|E(G)§ < N< (t+s2)n§ => |E(G)|> < 29(t +5?) n?}, 


and taking sth powers gives us the desired result (with C,; = 2(t + s?)*/5). 


Fact 116 
It’s conjectured that this result is tight up to the constant factor (because this proof strategy is really the only one 


that’s known). That’s known for K2 2, which implies the result for K2, (because if we can construct an example 


for K2,2, that example also works for K2,4 and we still have the same bound in the theorem), as well as for K33 


and thus K3+ (Brown, 1966). But the true upper bound for K4,4 is unknown, and the only other thing known 
(Bukh, 2021) is that if t > 98545” (which is basically 9°+°(S)) we also have a tight upper bound (though 20 years 
ago it was known for t > s!+1 and then t >(s—1)!+1). 


All of the constructions above are algebraic, but some of them make use of a “randomized algebraic method.” To 


motivate that a bit and understand how we construct these upper bounds, consider the following setting: 


Example 117 
Let p bea large prime. We will construct a bipartite graph G, such that the left and right vertex set are both F%5, 


and such that G does not contain any copies of Ks +t. 


We may have a few concerns with this setup, which we'll address now. First of all, notice that we only lose at most 
a factor of 2 when we go from a general graph G to a bipartite one, because we can always take a random bipartition 
of G, meaning that each vertex of G goes into one of the two sides and we delete all edges within that set. We keep 
each edge with probability $ through this, so there is indeed always a bipartition for which at least half the edges are 
kept. Second, even though this construction only allows us graphs with n = 2p®* vertices (where p is a prime), notice 
that for any n €N, there is some p such that 54; < p* < 4 (by Bertrand’s postulate, there is always a prime between 


mand 2m). So we can basically take 2p° of the vertices and do this construction, isolate the remaining n — 2p% 
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vertices, and because n is only off from 2p* by at most a constant factor 2° we don’t lose any asymptotic strength 
here either. 

The construction works as follows (and this is where the algebraic structure comes in): for any (x,y) € F5 x F3, 
we decide whether to include the edge between x and y via an algebraic construction. This is a bit complicated, so 


we'll describe some simple cases for illustration: 


+ For Ko2, we have x = (x,,X2) and y = (yi, yo), and we include an edge between x and y if and only if 


X1V1 + Xoy2 = 1. 


- For K3,3 and p = 3 mod 4, we connect (x1, x2, x3) to (1, yo, ¥3) if and only if (x1 yz)? +(x2—y2)? +(x3—- yz)? = 1. 


+ For Ks with t > s! +1 (one of the earlier bounds in Fact 116), instead of taking F% we in fact want to take 
Fps (which has additive structure isomorphic to F3 but also has a field structure). Then we include an edge 
(x, y) € Fps x Fps if and only if Norm(x + y) = 1 in Fp. (If we haven't taken a Galois theory class, the norm of 
an element a € Fps is the product of all Galois conjugates of a. But alternatively, Norm(a) is the determinant 


of the F,-linear map F,s — Fps which sends z to az, and that explains why Norm(q) is always in Fp.) 


It’s difficult to analyze the latter two constructions, but we can do so for Ko. and we will do so now (remember 


we're doing this to check that K6vari-Sos—Turan is tight for s = 2). 


+ First, we calculate how many vertices and edges our construction gives. We have n = 2p? vertices, and our goal 
is to get O(p*) edges (since 2—1/s = 3/2 and we want n°/? edges up to a constant factor). If (x1, x2) = (0,0), 
then there are no edges from x to any of the other vertices. But otherwise, we always have p choices for (y1, y2) 
because we have a nontrivial linear equation in F2, cutting out a line. Thus, we indeed have p(p* — 1) = p® 


edges in this graph we've constructed. 


* Now we must check that there is no Ko in this graph. In other words, we must show that for any distinct 
(x1, X2) and (x}, x4), there is at most one (yi, y2) satisfying x1y1 + Xoyo = 1 and xjy1 + xbyo = 1. (Normally 
we would need to make sure that there aren't any other edges in the induced subgraph H, but because we 
constructed G to be bipartite and Kz is complete bipartite we don't need to worry about that.) But these 
are either linearly independent equations or (x, Xo) is a multiple of (x4, x5), so there must be only one or zero 


solutions, respectively. 


Remark 118. We can think about this K2.2 construction by instead thinking about the finite projective plane over 
F,, where the vertices on the left are the points in the projective plane, the vertices on the right are the lines in the 
projective plane, and edges are drawn corresponding to incidences. This is basically the previous construction “deleting 
the zeros,” and it’s a nice tool for generating constructions in various situations. (In fact, this is how the game Dobble 


generates its cards, each containing 6 or 8 symbols, so that each pairs of cards has exactly one symbol in common.) 


Remark 119. The reason we cannot use x1¥1 + Xeyo + X3¥3 = 1 for the K3,3 case is that (even though we get the 
right number of edges) we will not have a K3,3-free graph is that we can pick three (x1, X2, x3) triples so that the 


corresponding hyperplanes all intersect along a common line. Then picking any three points (y1, ¥2, ¥3) along that line 


gives us a K3,3. Instead, the (x; —y,)* + (x2 — yo)? +(x3 — y3)? = 1 construction basically comes down (non-rigorously) 


to the fact that three unit spheres can only intersect in two points. 


This kind of construction is hard to do for general s because it requires us to construct specific polynomials to 
establish the algebraic condition, and the reason we have a new bound t > g°s4s”° is because there is a nice way to 
do so randomly. (This type of approach started becoming popular recently — for example, a randomized algebraic 
construction was used to prove a major result in combinatorial design theory.) What we'll see next time is how such 


a construction works for our avoiding-bipartite-graphs problem! 
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23. May 5, 2022 


Remark 120. As a reminder, we should all fill out the subject evaluations (this is sort of like a “grade for the instructor’). 


Last lecture, we studied the quantity ex(n,H) (the maximum number of edges in an n-vertex graph without a 
copy of H as a subgraph) where H is the complete bipartite graph K;¢. (Erdds-Stone-Simonovits already tells us the 
asymptotic behavior for the case where H is not bipartite.) We proved the upper bound ex(n, Ks) < Ca for 
any s < t, and today we'll find a matching lower bound for sufficiently large t (relative to s). (Remember that it’s 
easier to avoid Kg, if t is larger, so ideally we would do this for Ks; but that’s only known for s = 2, 3.) We will follow 
Bukh’s argument from 2014; although Kollar, Ronyai, and Szabo (1996), and Alon, Ronyai, and Szabo (1999) already 
previously yielded better results, those bounds were obtained through explicit constructions that do not generalize. In 
contrast, Bukh’s argument is more general and was in fact recently improved (in 2021) to t > 95(s*s?/3), better than 


the previously known t > (s — 1)!4 1. 


Theorem 121 
2-1/s- 


If t is sufficiently large with respect to s, then ex(Ks4) > Cs. 


Proof. To prove this result, we must construct an example of a graph on n vertices with cs,¢n?~1/S 


edges not containing 
any copies of K,+. Like last lecture, we may assume n = 2p* for some prime p by Bertrand’s postulate, and we may 
construct our graph to be bipartite — these will only contribute constant-in-n factors to the bound. Rewriting the goal 


in terms of p, we wish to construct a bipartite graph G with p° vertices on each side, with at most c,.p?5~1 


edges 
(possibly a different c..¢ than above), and with no copies of Ks ¢. 

To do this, we identify each side of our graph with F7 (s-tuples of F,), and we connect x on the left side with y 
on the right side with some algebraic condition. We did this last time for the K2,2 case with an explicit example of the 
polynomial, but Bukh's key insight is that we can now pick a random polynomial f € F,[x1,--- , Xs, ¥i,-"- Ys] and 
connect x and y in G if and only if f(x1,--- Xs, ¥1,°"* , Ys) = 0. However, we'll need to be a bit more precise with 
this — there are only finitely many possible evaluation functions, but we're going to want to think of f as an actual 
polynomial rather than just as a function. 

Specifically, we'll let d = s? — 1 (we'll see why we choose this particular value later), and among all polynomials in 
Fp[x,-+- Xs, ¥1,°** Ys] of degree at most d, we pick one uniformly at random. (There are various variations on how 
to set this up exactly — we could use homogeneous polynomials, or require that the degrees in x and y are the same 
— but this is a relatively simple one to state.) Naively, for any x and y it makes sense that f(x, y) = 0 occurs with 


probability but we need to be more precise with that if we want to consider graphs of the form K; ¢: 


Lemma 122 
Let ry Oh vee, ey) € F,, x F5 be distinct pairs of points with m< d+1 (though the same x can 


show up multiple times in different pairs). Then 


ee 1 
P (Fx, y) =) hey m) ae 


In other words, if we have sufficiently few points, there won't be nasty dependencies in the probabilities of vanishing. 


(And we can really just think of each (xO, yO) as a point in F?s and evaluating at a polynomial f in 2s variables. ) 


Proof of lemma. Notice that for each 1 < i < m, we can find a polynomial P; € Fp[x1,--- , Xs, ¥1,°°* »¥s] of degree 


at most m—1 < d, such that P(x), yW)) = 1 if i = / and O otherwise. (This is because for each j 4 /, we 
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may pick a hyperplane (degree-1 polynomial) through (x, y¥) but not through (x, y“), and multiply all (m— 1) 
such hyperplanes together to get a polynomial vanishing on all points except x, y, then rescale.) The polynomials 
P,,--- , Pm are then linearly independent, so we can extend {P,,--- , Pn} to a basis of all polynomials f of degree at 
most d, which we'll write {P,,--- , Px} (where k = C27)), 

We can now pick f uniformly at random by picking uniformly random coefficients for each element of the basis — 
while a priori tt might be more natural to pick coefficients for the monomials at random, for this problem it’s more 
insightful to pick a,,--- ,ax € Fp, independently and uniformly at random and set f = a; PF; +--- + axP,. And now 
if we pick am+i,--: , ax first, the polynomial g = am+1Pm+1 +--+ axP, takes some values on the evaluation points 
(xX) yD), see (xl, yom). But no matter whatever coefficients we choose, the probability that F(x, yO) =0 


now only depends on a; (not any of the other coefficients a,,--- , am) and occurs with probability a Since the ajs are 


independent, this gives the desired result. 


We can now return to the properties of G. The expected number of edges of G when we pick f randomly (applying 


25-1 The idea is that we wish to show that 


Lemma 122 with m= 1 and using linearity of expectation) is p*- p°- ; =p 
G will have few copies of K,; (in particular, at most half as many as p*5~'), because then we can get a new graph 
with no copies of K;; by deleting an edge from each K,;; subgraph. 

To control the number of copies of Ks +, we can just look at those with s vertices on the left and t vertices on 
the right (we will account for a factor of 2 later to count the other way around by symmetry). Having a Ks.¢ means 
that if we look at a set of s vertices on the left and look at their neighborhoods, those neighborhoods intersect in at 
least t points. (So we won't exactly count the number of Ks +5, only count the number of problematic s-sets, and 
we'll delete all the neighbors of a given left vertex to fix those problematic sets.) For any subset U on the left of size 


s, we define N(U) C F. to be the set of its common neighbors. For any fixed U, we then have 


Spee 
ELIN(U)I] = BF = = 1, 


because for any point on the right we can apply Lemma 122 to find that there is a probability ra that it’s connected 


to all of U (and then we use linearity of expectation). (Here it’s important that s < d+ 1.) We can then say that by 


Markov’s inequality, only a fraction + of the sets U can be bad, but that’s not good enough for us because it’s still a 
constant fraction of all possible sets (and we should remember that p is very large relative to t, even if t is very large 
relative to s). So we need to “enhance” Markov’s inequality by looking at the qth moment, and it will turn out that 
q = s* is the right choice. Notice that |N(U)|% is the number of g-tuples of points, not necessarily distinct, on the 


right that are all connected to elements of U, so 


E[|N(U)|9] = E [(y1.-++ . ¥q) € (F§)4 : yi. -++ 1 Yq Common neighbors of U] . 


We now claim that this expectation is at most a constant depending on s. (This is a bit trickier because the number 
of edges in the constraint may vary depending on whether there are repetitions in the y;s.) For any fixed q-tuple 
(Y1,°** Yq) € (F3)4 containing k different points (for some 1 < k <q= s*), the probability that y1,--- , Yq are all 
common neighbors of U is cr (because there are sk different edges that must all be present for this to occur) by 
Lemma 122, since sk < d+1 = s%. Since the number of g-tuples (y1,--- /Vq) € (i)? with at most k different 


entries is const(k, q)p**, which is a constant depending on s times p**, we find that 


qd 
1 

a[|N(U)|7] = S— const(s)p** - — < C, 

[|V(U)|7] > (s)p — 


for some constant Cs, as desired. This is still not good enough — applying Markov’s still gives us a constant fraction 
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Cs 
9 


algebra comes in — for every fixed x, f(x1,--+ , Xs, ¥1,°** » Ys) = 0 is a polynomial condition in y, so we are cutting 


of the total possible sets U that can be problematic, and this still has no p-dependence. But here’s where the 


out an algebraic set (in other words, a variety) when we require a point to be in N(U). And intuitively, the idea is 


that varieties have dimensions which constrain the number of points that can be in the set N(U): 


Fact 123 


Suppose fi,--- , fk € Fp[yi,--- ,¥s] are polynomials of degree at most d (not necessarily the same d as above). 


When the set Wo ty eR, iy) vee f(y) 0} either has |W| < const(s,k,d) or |W] > p— 
const(s, k, d),/p (in particular, |W| > § for sufficiently large p). 


(This is essentially the Lang-Weil bound — usually this kind of result only holds for irreducible varieties over al- 
gebraically closed fields, but there are ways to generalize it.) Now for a fixed U = {x@), ee xs}, we indeed have 
N(U) = {y € FS: f(x,y) = F(x@, y) = --- = F(x), y) = 0} (so in other words, we define fi(-) = F(x, -)). 
So either |N(U)| < Cs or |N(U)| > §, and now we can make use of our Markov bound. Assuming that t > Cs (the 


constant from our g-moment bound above), we now have 


<a 


by Markov’s inequality. Since this held for a fixed U, we find that the expected number of sets U with |N(U)| > t 


P(IM(U)| > t) =P (IM(U)| = 5) =P (IM(U)I7 = ( 2Cee. 


is at most (p5)5- Cip~* = CL. Similarly, the expected number of sets U on the right side of size s with |N(U)| > t 
is also at most that constant. So if we now make this “deleting” argument, where we delete all edges adjacent to a 


particular bad vertex for each U, 
E[|E(G)| — p® - number of badU] > c,.¢p25~1 — 2Cips > st ppt 


for sufficiently large p. Thus we can pick some function f such that deleting these bad edges gives us a graph with at 
2s—1 


least Stp 


edges and no copies of Ks, as desired. 


24 May 10, 2022 


We'll continue last week's discussion on extremal numbers of bipartite graphs today. In particular, we've previously 
discussed the Kovari-Sos- Turan upper bound on ex(n, Ks,¢) (with a double-counting argument), and we've also obtained 


a matching lower bound (up to constants) on ex(n, Ks) (with a probabilistic argument). 


Fact 124 


To discuss what's known in more detail for general bipartite graphs, the value of ex(n, H) is known up to constant 
52/3 


(though we did not show this strongest bound). 


factors for Ks, (as discussed in lecture) for s = 2,3 or t > 98s* 


Other than that, we also know ex(n, H) for (1) a tree, (2) a cycle of length 4, 6, or 10, or (3) a collection of 2 
paths of length k all connected at the start and end point for sufficiently large 2 relative to k. (In particular, this 
is a single cycle if 2 = 2.) The answers are (1) Oy(n) (meaning linear in n with constant factor depending on H), 
(2) Ox(nt Vk), and (3) Oxre(n't/*), respectively. 


Recall that for graphs H that are not bipartite, ex(n, H) is asymptotically proportional to n?, and for all of the 


bipartite graphs above we have ex(n, H) = Ox(n®) for some rational number 1 < a < 2. We can check that for any 
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graph H with more than one edge, we must have 1 < a < 2 if the answer is of this form, but trying to prove that a 
is indeed rational or that the asymptotic behavior is still open. And there is in fact a relevant “backwards” conjecture 


here (still widely open, though there is active progress being made): 


Conjecture 125 (Erdos-Simonovits) 


For every rational 1 < @ < 2, there is some graph H with ex(n, H) = Oy(n%). 


However, we can also extend our definition ex(n, H) to finite families H. of graphs and let ex(n, 1) be the maximum 


number of edges in an n-vertex graph which contains no graph H € H. Then there are results that are known: 


Theorem 126 (Bukh—Conlon (2018)) 


For every rational 1 < a < 2, there is a finite family H of graphs such that ex(n, 1) = Ox(n%). 


(For a = 1 we can take a tree, and for @ = 2 we can take any non-bipartite graph, so we only need to consider 
rational numbers strictly between 1 and 2.) In order to understand this result, we'll describe a construction. Consider 
a tree T, and let R C V(7) be a subset of the vertices with no edges between vertices in R (in the paper, these are 
called the “roots” of the tree, because we can imagine placing the vertices in R at the bottom — all roots will turn out 
to always be leaves in our construction). Normally we would call these “independent sets,” but we will reserve the use 
of “independent” for probabilistic arguments later. 

We'll then consider many copies of the same tree and glue along roots — specifically, we'll consider all ways that 
we can glue t different copies of T along the set R, such that all t copies of any r € R are identified, and different 
non-root vertices may be identified as well as long as we don't literally glue two copies of T on top of each other. 


Here's a diagram of some ways in which we may identify two copies of a path of length 4: 


voce DE eet 
Fe ge ake 


Let Hes be the family of graphs that may be obtained in this way. 


Definition 127 
For a tree with roots (7, R), let the density p(T, R) be 


IE) 


Se ee 


The idea with this quantity is that |\V(7) — |R| is the number of “additional vertices” beyond the roots that each 


new copy of T may have, and this density is always at least 1. 
Definition 128 
A tree (T, R) is balanced if for every subset S C V(T) \ R, we have 


number of edges with at least one vertex in S 


|S| 


Ou ae 
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In other words, there are always a reasonable number of edges touching S — for example, taking S = V(T)\R 
itself, we get equality because all edges touch at least one vertex in V(T)\ R. And we're now ready to state the main 


result we'll prove today: 


Theorem 129 (Bukh—Conlon) 
Suppose (7, R) is a balanced tree. Then for sufficiently large t € N, we have 


ex (n, Zoe) =Orrt (nares) : 


In particular, if we can show this result, then we can prove Theorem 126 by showing that we can have p(T, R) be 
any rational number larger than 1 (by attaching roots to a tree in a way that vaguely resembles the Euclidean algorithm 


but is completely explicit — we won't do it here). 


Example 130 

Suppose (7, FR) is a star graph with s leaves and a single center vertex, and set R to be the set of all leaves. 
Then notice that He ey = {K;+} (each center vertex must be different when we overlay the graphs), and if we 
calculate p(T, R) we will find that this is in agreement with the results we proved last week. 


Example 131 

Suppose (7, R) is a path of k edges with R being the two endpoints of the path. We can check that this graph 
is balanced, which gives us a generalization of part (3) of Fact 124. Specifically, Bukh and Conlon’s result gives 
us the lower bound because it constructs an explicit example of a graph which ignores the @ paths of length k all 


glued together at the endpoints. 


We'll just do the proof of the lower bound — the upper bound essentially comes from considering the average degree 
of the vertices and constructing a subgraph in which all vertices have at least half that degree (only losing a constant 
factor), at which point we can construct many copies of T and thus some choice of root vertices R will give us a 
graph in ee 

This proof will follow a similar randomized algebraic construction as last lecture’s proof, and we need some technical 


results first: 


Lemma 132 


For any balanced tree (7, R), any H € Hee has “many vertices:” 


|E(H)| = o(T, R)(IV(A)| — IRI). 


(For example, if we glue all copies of T distinctly, we get equality.) Essentially, the proof of this is induction on the 
size of t: the base case t = 1 is clear, and for the inductive hypothesis we're adding a new copy of 7. But if S is the 
set of new vertices, we introduce |S| new vertices and at least p(T, R)|S| new edges because the tree is balanced. 

Our next result is similar to the result about sizes of varieties that we used at the end of last class, and it follows 


from algebraic geometry (again this is because of having a “well-defined dimension” in some sense): 
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Lemma 133 
Suppose f,,--- fk, 91,°-> »9e € Fp[M.--- , Ys] are polynomials of degree at most d, and define 


W={yeF,: f(y) =---=f(y)=0}, D={yeF}: aly) =--- = gly) =O}. 


Then the size of the set W \ D satisfies either |W \ D| < const(s, k, 2, d) or |W\ D| > p—const(s,k, 2, d),/p (in 
particular, in this latter case we have |W \ D| > 5 for sufficiently large p). 


(The idea is to reduce to the “irreducible” case, in which we know that the intersection of W and D is either W 


itself or of a lower dimension.) 


Proof of the lower bound of Theorem 129. To make our notation easier, we'll define a = |V(T)|—|R| and b = |E(T)|, 
so that p(T, R) = 2, and we'll also define r = |R|. Label the vertices of T by the set {1,--- ,a +r}, such that the 
first r vertices are the roots in R. 

Let d = bq and q = 2br (if we remember from last time, d will end up being the degree of our polynomial, and q 
will be the moment that we consider). We need to construct a graph G of n vertices and Q(n2-2/%) = Q(n(26-a)/b) 
edges, and we'll do this, like last time, by constructing a bipartite graph G of 2p” vertices identified with two copies 
of Fe for some prime power p — remember that doing this only loses us a constant factor. And this time, we'll decide 
whether we have edges between vertices by looking at multiple random polynomials (instead of just one): letting 
fi,-+> fy © Fp[x1,--+ , Xb, ¥1,°°* » Yb] be independent random polynomials of degree at most d, we will draw an edge 
between x and y if and only if f(x,y) =---=4(x,y) =0. 


Since f(x, y) = 0 occurs with probability a the expected number of edges in our graph is 


*[IE(G)I) = (0°)? (=) = p22 


(Notice that this is precisely the order of the number of edges that we want, since n = 2p’.) We now need to avoid 
elements of he R)? and we're going to do this by counting the number of potentially bad r-tuples that could 
potentially be roots. In other words, mirroring the proof from last time, fix any (Z1,--- ,2Z-) € Fe (as a set of potential 
roots) and fix z, to be on the left side of the graph (we'll need to put in an additional factor of 2 later). Since T is 
a tree, this fixes which side each root z will be on, and now define M(z,,--- ,z,) to be the number of copies of T 
appearing in G, with root 1 on the left and with root / mapped to z; for all 1 <i <r (this is a random variable). Our 
ultimate goal is to avoid having |M(z,,--- ,Z,)| > t, because that would give us t copies of T with the same roots R 


and thus give us an element of He R): We have 


MGR zl (py (oy at 


because intuitively there are at most (p?) potential places where each non-root vertex can go (which side it’s on is 
fixed by the choice of root 1), there are a such non-root vertices, and each edge occurs with probability p~?. (Actually, 
because of independence arguments, we needed to use the result from last lecture, which says that the probability any 
of the a polynomials vanishes on all b edges is p~’ because b is sufficiently small.) And now just like last time, we do 
a moment bound: thinking about g-tuples of these potential trees, we sum over how many (q’) distinct different trees 


actually occur, and we find that (after some calculation mirroring last lecture) 


IM (z1.--- Zr) S$ DO (q')9 SD (payMOMIFIRL. (pay FON, 
rsa Hen). 
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But now we can use the lower bound from Lemma 132 and find that this expectation is independent of n and is a 
constant const(T, R). And like last time, Markov’s inequality alone is not enough, but we will apply Lemma 133 here 
— copies of 7 that contribute to M are indeed solutions of a combination of polynomial relations in Z-41,-°-+ , Zr+a, 
because for example requiring that 1 and r+ 1 are connected in the tree requires f,(21, 2-41) = ++: = f9(Z, 2-41) = 0, 
and for example requiring that r+ 1 and r+ 2 be not connected requires fi (2-41, Z-42) = °°: = fo(Z-41,Z-42) #0 
(or, if r+ 1 is on the right and r+ 2 is on the left in our fixing of the tree, f,(Z-42, 2-41) =--° = fa(Z-42, 2-41) # 0). 
Additionally, to make sure we actually count the exact quantity M(z,,--- ,Z,-) is counting, we have to also make sure 
that we don't have a “degenerate” situation where we use the same vertex twice. So in particular, we must also require 


that whenever / and / are fixed to be on the same side of the tree for r+1</AJj<r-+ta, we have z — z 40. 


With that, the set of points in M(z,,---,zZ-) is indeed restricted as in Lemma 133, so we then find that 
M(z,--+,2Z,) is either at most const(7T,R) or at least , so we can choose t > const(T,R). And just like last 


time, we therefore get by Markov’'s inequality that 


eL(M (Zi, +++» Zr) 9] 
(p/2)9 


Thus (remembering to count both left and right sides) the expected number of events where M(z,,--- , Z-) > t over all 
—2br 


P(M(z,--:,Z-) >t) =P (Ma. 11+, Z-)> =) < < const - p~7 = constp~ 2°". 


~ 2 


possible (z,,--+ , Z-)s (and also accounting for z, being potentially on the left or the right) is at most 2(p?)"-constp 
which is in particular bounded by a constant. The rest of the argument is now identical to last time: we have p2?~? 
edges in G in expectation, and if we delete all edges from one of the vertices in each (Z,,--- ,Z-) with M(z,--- ,Z,) 


too large, we delete at most cp* edges in expectation. Since a < b, the number of deleted edges is small for 


large p, and thus (for sufficiently large t and p) there is some choice of polynomials f,,--- , f such that we have 
©(p26-?) = ©(n(26-a)/b) = ©(n?-HT®) edges and avoid all subgraphs of ee as desired. 
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